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Chapter  1 
INTRODUCTION 


This  Investigation  Is  concerned  with  developing  new  techniques 
for  tracking  images:  specifically,  sampled  and  digitized  images.  The 
major  thrust  of  this  study  is  that  there  are  a number  of  ways  to 
improve  a tracking  system's  performance  in  the  presence  of  noise. 
Potential  improvements  in  system  performance  can  be  traded  off  for 
lower  system  cost  or  retained  for  higher  speed,  improved  accuracy,  and 
a larger  class  of  trackable  images. 

1.1  Research  Objectives  and  Approach 

The  primary  goals  of  this  research  are  to  develop  and  evaluate 
techniques  for  tracking  sequences  of  digitized  images.  The  approach  to 
be  used  in  searching  for  improved  tracking  techniques  will  be  to  first 
develop  a meaningful  signal-to-noise  ratio  by  examining  the 
characteristics  of  a minimum  norm  similarity  measure  for  digitized 
images,  and  then  investigate  ways  to  increase  the  signal-to-noise  ratio 
through  "intelligent"  processing.  By  adopting  a model  for  a generalized 
image -tracking  system  which  is  partitioned  by  task,  and  examining  each 
task  for  ways  to  improve  the  effective  signal-to-noise  ratio,  the 
overall  system  performance  may  be  improved.  It  is  essential  to 
recognize  the  importance  of  interactions  among  the  various  tasks  or 
operations  within  the  tracking  system  and  to  avoid  employing 
' innovations  in  one  area  which  might  be  detrimental  to  the  total  system. 


1.2 


The  linage  Tracking  Task 


Consider  lor  a few  moments  the  problems  of  building  a machine 
to  play  tennis.  Obviously,  there  are  a number  of  functions  that  must  be 
carried  out  by  this  machine  if  it  is  to  do  any  more  than  simply  serve 
the  ball.  It  must  certainly  be  able  to  move  around  the  court  with 
sufficient  speed  to  return  the  ball;  it  must  be  able  to  swing  a racket 
and  hit  the  ball;  but  above  all,  it  must  be  able  to  track  the  ball  with 
sufficient  accuracy  to  be  able  to  predict  where  and  when  to  swing  the 
racket. 

If  we  build  this  machine  with  an  electronic  eye,  how  will  it 
actually  track  the  ball?  To  answer  this  question,  we  must  investigate 
the  various  pieces  of  the  general  machine-implemented  image-tracking 
task. 

The  tracking  task  can  be  thought  of  as  consisting  of  two 
separable  and  sequential  subtasks:  acquiring  the  target  and  tracking 
the  acquired  target  image.  Before  an  object  or  target  can  be  tracked, 
it  must  be  located  within  the  image . For  a human,  this  may  be  simple, 
but  even  this  almost  unconscious  action  consists  of  a sequence  of  tasks 
which  are  quite  difficult  for  a machine.  An  orderly  search  must  be 
conducted  using  stored  data  about  where  the  target  is  likely  to  be. 
Then,  for  each  object  found,  the  recognition  or  classification  task 
must  be  attempted,  perhaps  with  the  aid  of  contextual  information.  This 
problem,  while  it  may  be  trivial  for  a 10  year  old,  is  very  difficult 
to  solve  using  a machine.  In  fact,  for  most  current  applications,  a 
human  operator  must  still  perform  the  acquisition  or  target  designation 
task  for  the  tracking  machine. 
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1.3  A Model  of  an  Image-Tracking  System 

Once  an  object  to  be  tracked  is  found  in  the  image,  how  is  the 
actual  tracking  accomplished?  The  particular  algorithm  used  by  the 
human  eye/brain  combination  is  not  known,  but  one  possible  model  of  a 
generalized  image -tracking  system  is  illustrated  in  Figure  1.  The 
components  of  the  model  are  discussed  in  the  following  paragraphs  to 
provide  some  insight  into  the  problems  of  mechanizing  an  image-tracking 
machine.  The  sequence  of  steps  to  be  followed  by  the  machine  is  as 
follows : 

I.  Reference  image  initialization 

II.  Image  sensing 

III.  Image  motion  compensation 

IV.  Preprocessing,  consisting  of 

A.  Filtering 

B.  Enhancement 

C.  Feature  extraction 

D.  Comparison  set  selection 

V.  Similarity  detection 

VI.  Reference  image  update 

VII.  Target  modeling  and  target  state  prediction 

VIII.  Sensor  movement  to  maintain  the  target  in  the  field  of 

view 

In  the  following  subsections,  each  of  these  steps  will  be 
discussed  separately.  Due  to  the  interrelationships  which  exist 
between  the  various  steps,  some  definitions  will  be  deferred  to  each 
subsection. 
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Figure  1.  Model  of  an  Image  Tracking  System 


1.3.1  Reference  Image  Initialization 

The  solution  of  the  acquisition  problem  Implies  that  an  Initial 
reference  Image  exists  since  there  must  be  something  with  which  to 
compare  the  data  Image.  For  some  systems  this  may  be  a photograph  or  a 
computer  generated  image  of  the  target;  it  could  be  a prestored  set  of 
target  characteristics;  or,  it  might  be  one  of  the  raw  sensor  Images. 

In  order  to  track  the  largest  possible  class  of  images,  it  seems 
essential  that  the  initial  reference  be  derived  from  the  raw  sensor 
data.  In  Chapter  5 we  will  concentrate  on  extracting  the  best  possible 
reference  image  from  the  senior  data. 

The  ability  of  a tracking  system  to  maintain  track  on  a 
particular  target  is  strongly  dependent  on  the  quality  of  the  reference 
Image.  If  the  Initial  reference  image  Is  not  similar  to  the  current  sensed 
data  image  due  to  image  motion  or  rapidly  changing  viewing  angle,  the 
similarity  detection  process  (see  Section  1.2.5)  may  not  find  a "best 
match”  which  is  satisfactory,  and  the  updated  reference  image  may 
not  represent  the  true  target  any  more  closely  than  did  the  initial 
reference  image.  In  other  words,  a "trackable"  target  image  is  defined 
by  having  a reference  image  sequence  which  "converges"  to  the  target 
Image. 

1.3.2  Image  Sensing 

Many  different  types  of  sensors  can  be  used  in  image -tracking  systems: 
vldicons,  Correia trons(l),  charge -coupled  devices  (CCD's),  charge- 
injection  devices  (CID's),  one-dlmenslonal  and  two-dimensional 
photodiode  arrays,  single-detector  line-scanning  systmis,  and  many  more 

(1)  A registered  trademark  of  the  Goodyear  Corp. 
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that  are  still  in  the  early  stages  of  development.  Among  the  sensors 
available  today,  there  are  a variety  of  scanning  methods  and 
sensitivities  with  a wide  range  of  resolution,  geometric  fidelity, 
spectral  response,  and  speed.  The  noise  characteristics  of  the  various 
types  of  sensors  also  vary  between  types.  In  some  cases,  the  noise 
characteristics  can  vary  within  a single  device  due  to  environmental  or 
operating  conditions,  or  due  to  manufacturing  methods.  For  example,  a 
large  area  two-dimensional  array  sensor  might  be  made  up  of  many 
smaller  arrays,  each  of  which  has  a characteristic  noise  which  is 
uniform  over  that  small  device;  or  a CCD  sensor  may  have  a nonuniform 
sensitivity  over  its  active  area,  thus  producing  what  is  known  as 
"fixed  pattern  noise"  [7] . As  a result,  care  must  be  taken  in 
designing  a tracking  algorithm  or  preprocessor  to  assure  compatibility 
with  the  sensor. 


1.3.3  Image  Motion  Compensation 

In  the  general  case,  the  sensor  may  be  both  translating  and  rotating 
with  respect  to  the  target,  but  for  the  cases  that  we  will  be 
investigating,  it  is  assumed  that  image  motion  on  a f rame-to-f rame 
basis  consists  of  translation  only.  There  are  a number  of  reasons  why 
this  assumption  seems  to  be  well  founded  for  a large  fraction  of  the 
tracking  problems  of  interest.  First,  sensors  that  are  stabilized  in 
either  pitch  and  yaw  or  azimuth  and  elevation  on  a moving  vehicle  will 
observe  only  small  components  of  roll  about  the  sensor  optical  axis  as 
long  as  the  vehicle  is  roll  stabilized  [21J.  Second,  when  the  sensor 
optical  axis  is  pointed  away  from  the  instantaneous  angular  velocity 
vector  by  more  than  one  half  of  the  sensor  Instantaneous  field  of  view, 


Che  center  of  rotation  for  the  image  is  not  contained  within  the  image 
itself*  Thus,  the  resulting  apparent  image  motion  is  in  large  part 
translation  perpendicular  to  the  vector  from  the  center  of  the  sensor 
field  of  view  toward  the  instantaneous  center  of  rotation  of  the  image 
(see  Figure  2).  Third,  for  reasons  that  will  be  discussed  later  (see 
Section  2.2.1)  the  majority  of  the  sensor  instantaneous  field  of  view 
is  masked  out  by  the  tracker  and  only  a small  subsection  of  the 
available  image  is  passed  to  the  similarity  detector.  This  small 
region  is  referred  to  as  the  field  of  regard  or  gated  region  of  the 
image.  This  process  results  in  a considerably  smaller  effective  field 
of  view  than  even  the  instantaneous  field  of  view  of  the  sensor,  thus 
minimizing  the  error  that  can  occur  when  correcting  for  rotation  through 
translational  adjustments. 

An  important  part  of  any  image -tracking  scheme  is  the  sensing 
of  and  compensation  for  the  sensor  motion.  The  size  of  the  search 
region  within  an  image  depends  on  the  uncertainty  in  the  relative 
motion  of  Che  target  with  respect  to  the  sensor  optical  axis.  Any  data 
which  can  be  used  to  reduce  the  relative  motion,  or  compensate  for  it, 
will  have  the  effect  of  limiting  the  required  search  area  for  the 
similarity  detection  process.  Reduction  of  the  search  area  is  directly 
translatable  into  an  increase  in  the  available  computation  time  for 
each  search  point  since  fewer  comparisons  need  to  be  made  for  each  new 
data  frame. 

1.3.4  Preprocessing 

The  preprocessing  step  in  an  image -tracking  system  has  historically 
been  necessary  because  of  the  relatively  poor  performance  of  the 
similarity  detection  process  when  operating  on  raw  imagery  [8]. 
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Figure  2.  Image  Motion  Due  to  Rotation  About  a Point 
External  to  the  Instantaneous  Field  of  View 
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Typical  preprocessing  functions  are  amplitude  normalization,  signal 
enhancement,  noise  reduction  through  linear  and  nonlinear  filtering, 
filtering  in  the  Walsh  or  Fourier  domain,  and  other  feature  extraction 
techniques.  The  resulting  image  is  generally  tailored  to  the 
requirements  of  the  similarity  detection  process.  The  design  of  the 
preprocessing  function  is  still  more  of  an  art  than  a science,  and  the 
success  of  most  of  the  presently  available  image-tracking  systems  can 
be  attributed  in  large  part  to  the  insight  of  the  preprocessor  design 
engineer  into  the  characteristics  of  the  sensor,  the  operating 
environment,  the  typical  inagery  that  would  be  encountered  in  the 
field,  and  the  similarity  detection  algorithm  that  is  to  be  employed. 

Tt  is  significant  to  note  that  the  noise  statistics  in  the  raw  data 
image  are  generally  determined  by  the  sensor  physics  and  the  signal 
detection  and  processing  electronics.  But,  vhatover  the  characteristics 
of  the  sensor  noise,  the  preprocessor  can  modify  them.  This  fact  also 
enables  the  system  designer  to  tailor  the  noise  statistics  if  the 
result  is  compatible  with  the  remainder  of  the  image-tracking  process. 
For  example,  noise  in  a sampled  raw  data  image  can  be  made 
approximately  Gaussian  by  processing  the  raw  data  image  with  a moving- 
window  averaging  operator  if  the  window  contains  a sufficient  number  of 
points  for  the  central  limit  theorem  to  be  applied  [26]. 
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1.3.5  Similarity  Detection 

The  similarity  detection  process  compares  the  reference  image  with  a 
number  of  candidate  subimages  taken  from  the  data  image.  The  result  of 
each  comparison  is  a similarity  measure.  The  location  of  the  sublmagc 
irtiich  produces  the  maximum  similarity  measure  (see  Sections  2.2.1, 
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2.2.2,  and  3-9)  is  taken  as  the  current  estimate  of  the  location  of  the 
"target"  as  represented  by  the  most  recent  reference  image.  The 
similarity  detection  process  is  the  heart  of  any  image-tracking 
algorithm.  Some  tracking  algorithms  produce  only  a direction  from  the 
center  of  the  data  image  to  the  assumed  location  of  the  best  match 
subimage  without  actually  finding  the  best  match  (characteristic  of 
most  analog  point-trackers,  see  Section  2.2.1);  other  tracking 
algorithms  carry  out  the  search  for  the  most  similar  subimage  and 
output  its  true  location. 

An  exhaustive  examination  of  the  subimages  within  the  search 
region  requires  a period  of  time  which  grows  linearly  with  the  area  of 
the  search  region  or  as  the  square  of  the  search  radius.  As  a result, 
any  new  similarity  detection  algorithm  which  improves  the  speed  for  a 
single  similarity  measurement  operation  will  allow  either  an  increase 
in  the  rate  at  which  data  frames  can  be  processed  for  a fixed  search 
region  size,  or  an  increase  in  the  size  of  the  search  region  for  a 
given  data  image  rate. 

1.3.6  Reference  Image  Update 

A fixed  reference  image  may  be  possible  for  those  systems  which  are 
required  to  track  objects  which  do  not  change  their  characteristics  or 
which  can  be  modeled  as  not  changing  their  characteristics  while  they 
are  being  tracked.  For  our  tennis-playing  machine,  it  might  be  possible 
to  model  the  tennis  ball  as  a circle  which  never  changes  size  and 
adjust  the  scale  factor  of  the  sensor  output  to  make  the  received  image 
of  the  ball  match  the  reference  image.  An  alternate  approach  is  to 
change  the  reference  image  to  incorporate  the  apparent  change  in  the 


size  of  the  tennis  ball.  A refinement  of  the  simple  circular  model  for 
the  tennis  ball  might  include  the  location  of  the  seams.  This  might 
make  possible  the  estimation  of  the  spin  on  the  ball  but  would  require 
that  the  reference  image  be  updated  to  reflect  the  change  in  the 
location  of  the  seams.  The  spin  would  then  be  estimated  from  the  frame- 
to-frame  change  in  the  location  and  orientation  of  the  seams.  For  an 
image-tracking  system  which  maintains  an  actual  image  as  the  reference 
rather  than  a table  of  characteristics,  the  problem  of  updating  the 
reference  is  equally  complex.  The  basic  decision  that  must  be  made  is 
how  much  of  an  observed  diffe  ence  between  the  reference  image  and  a 
data  image  is  due  to  a change  in  the  actual  scene  and  how  much  is  due 
to  noise.  In  the  past,  the  quality  of  the  match  between  the  reference 
image  and  the  data  image  has  been  used  to  indicate  when  the  reference 
image  needed  to  be  updated.  The  actual  process  of  updating  the 
reference  has  varied  with  the  particular  system.  Some  systems  update  by 
getting  the  next  reference  image  from  a storage  file  [37]  . This  type  of 
system  is  obviously  limited  to  tracking  objects  and  images  that  are 
both  definable  ahead  of  time  and  not  changing  with  time.  Other 
systems,  once  they  decide  to  update  the  reference,  simply  use  the  most 
recent  data  image  as  a new  reference  image  [AO].  In  general,  updating 
the  reference  image  in  a system  without  prestored  reference  data  is  a 
difficult  problem.  The  criteria  for  evaluating  the  update  scheme  vary 
from  system  to  system  depending  on  the  particular  application.  The 
update  problem  is  one  of  measuring  the  changes  between  two  images  that 
are  declared  to  be  "most  similar"  by  the  similarity  detector  and 
deciding  which  changes  are  real  and  which  changes  are  due  to  noise 
phenomena.  The  similarity  detector  compares  the  given  reference  with  a 
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see  of  trial  images  and  picks  the  “most  similar"  image  for  use  by  the 
reference  update  algorithm.  For  this  reason,  the  reference  image  update 
task  is  treated  as  separable  from,  and  independent  of,  the  similarity 
detection  task. 

1.3.7  Target  Modeling  And  Target  State  Prediction 
An  output  of  the  similarity  detection  block  in  Figure  1 is  the  location 
of  the  "best  match"  between  the  reference  image  and  the  data  image.  In 
a classic  feedback  control  system,  this  output  signal  would  be  used  to 
drive  sensor  pointing  angles  directly.  In  a more  modern  "aided" 
tracking  system,  the  target  position  signal  is  compared  with  a 
predicted  target  position,  based  on  a model  of  both  the  sensor  dynamics 
and  the  target  dynamics.  The  difference  is  used  to  modify  the  state 
estimates  rather  than  to  drive  the  sensor  directly.  The  control  system 
then  computes  the  appropriate  drive  signals  to  null  the  observed  error 
based  on  the  new  updated  state  estimates  and  measured  target  position. 
If  the  model  of  the  system  (sensor  and  target)  is  perfect,  there  should 
be  no  observed  error,  and  the  models  will  not  need  to  be  updated. 
However,  system  noise  and  unmodeled  states  will  produce  errors 
which  will  disturb  the  models  so  that  they  are  constantly  in  need  of 
adjustment.  Errors  in  the  target  model  can  lead  to  a requirement  for 
an  enlarged  search  area  in  the  similarity  detection  process  in  order  to 
compensate  for  bad  (noisy)  predictions  of  future  target  position.  This 
increase  in  search  area  can  result  in  a longer  period  of  time  being 
required  to  find  the  "best  match"  image.  This  increase  in  solution  time 
then  requires  a longer  prediction  time  with  a further  increase  in  the 
possible  error,  and  thus,  an  even  larger  search  area.  Thus,  computation 


speed  in  the  similarity  detection  process  may  define  the  maximum 
allowable  search  area,  and  any  additional  computation  time  taken  by 
modeling  must  be  compensated  for  through  a reduced  search  area* 

1.4  Summary  of  Report 

^ ^ C- 

In^his  chapter  the  research  objectives  have  been  outlined,  the 
image  tracking  task  -has  been  defined,  and  the  component  parts  of  a 
generalized  image  -tracking  system  have  been  discussed.  Chapter  2 
presents  as  background  material  a summary  of  applications  for  image 
tracking  systems,  and  some  classical  similarity  detection  techniques. 

In  Chapter  3,  a two-dimensional  signal -to -noise  ratio  is  developed  for 
minimum  norm  similarity  detectors  and  an  upper  bound  is  developed  and 
experimentally  verified  for  the  probability  of  error  associated  with  a 
particular  type  of  misregistration.  Chapter  4 presents  three 
techniques  for  enhancing  the  s ignal-to-noise  ratio.  Chapter  5 presents 
an  adaptive  Kalman  filter  which  is  shown  to  be  very  helpful  in  reducing 
misregistration  errors,  and  Chapter  6 develops  and  analyzes  the 
performance  of  an  integrated  tracking  algorithm  which  incorporates  a 
complementary  set  of  component  algorithms.  Chapter  7 provides  a 
summary  of  this  research  and  some  suggestions  for  future  work. 
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Chapter  2 
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1 


BACKGROUND 


In  this  chapter,  some  historical  and  current  applications  for 
image  tracking  systems  are  presented,  and  some  classical  similarity 
detection  techniques  are  discussed*  Much  of  the  historical  information 
presented  in  this  chapter  is  drawn  from  the  author's  personal 
experiences  over  a ten  year  period  in  development,  test,  and  evaluation 
of  image  tracking  systems  at  the  Air  Force  Missile  Development  Center 
and  at  the  Air  Force  Avionics  Laboratory. 

2. 1 Applications  of  Image -Tracking  Systems 

This  section  will  briefly  discuss  some  historical  and  current 
applications  of  image -tracking  systems  in  an  attempt  to  provide  some 
insight  and  background  for  the  unfamiliar  reader,  and  perhaps  relate 
this  research  to  his  experience.  The  reader  is  encouraged  to  think  of 
some  application  for  an  image -tracking  system  which  is  peculiar  to  his 
field  of  expertise.  For  the  purpose  of  enumeration  here,  image-tracking 
systems  are  divided  into  categories  by  the  location  of  the  tracking 
device: 
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I.  Space  Applications 

A.  Automatic  rendezvous  with  noncooperative  targets 

B.  Autonomous  orbit  determination 

C.  Aiding  earth  based  tracking  for  outer  planet  probes 
and  orbitcrs 

D.  Precision  pointing  for  remote  sensing  systems 

II.  Airborne  Applications 

A.  Air-to-surf ace  missile  guidance 

B.  Air-to-air  missile  guidance 

C.  Navigation  system  updates 

D.  Line-of -sight  stabilization  for: 

1.  Angle  rate  bombing  systems 

2.  Air-to-air  gunnery 

3.  Reconnaissance  sensors 

A.  Target  identification  sensors 

III.  Ground  Based  Applications 

A.  Range  instrumentation  systems  for  tracking  aircraft 
and  missiles  with  multiple  cinetheodolite  cameras 

B.  Wind  estimation  in  remote  areas  derived  from  cloud 
tracking  in  satellite  imagery 

C.  Anti-aircraft  gun  pointing  systems 

D.  Image  registration  for  remote  sensing  using  satellite 
image  ry 


2.1.1  Space  Applications 

In  the  space  applications  area,  the  theoretical  basis  for  autonomous 
orbit  determination  using  line-of -sigh t information  from  landmark- 
tracking systems  has  been  established  [28]  . The  recent  Viking  missions 

to  Mars  demonstrated  the  feasibility  of  using  on-board  imaging  sensors 
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to  aid  in  deriving  the  precision  spacecraft  tracking  information 
required  i'or  accurate  orbit  Insertion  around  the  outer  planets  [20]. 

2.1.2  Airborne  Applications 

The  airborne  applications  are  the  most  highly  developed  real-time 
systems  on  the  list.  The  Walleye  elect rooptically  guided  glide  bomb, 
the  GBU-8  and  GBU-15  Modular  Guided  Glide  Bombs  (MGGB),  and  both  the  TV 
guided  and  imaging  infrared  versions  of  the  AGM-65  Maverick  missile 
use  image -tracking  systems  to  guide  the  weapons  from  launch  to 
target  [5],  [22].  Other  missiles  use  image -tracking  systems  in  their 
terminal  phase  of  flight  to  obtain  extremely  high  accuracies  [4]. 
Air-to-air  image -tracking  systems  were  tested  as  part  of  the  AIM-82 
program  to  develop  a new  air-to-air  missile  [6].  Problems  were 
encountered ‘when  the  trackers  had  difficulty  maintaining  stable 
tracking  as  the  target  flew  in  front  of  a cluttered  background  scene. 

Examples  of  line-of -sight  stabilization  systems  for  target 
acquisition  and  identification  are  the  Target  Identification  System 
Elect rooptical  (TISEO)  employed  on  the  F-4E  and  the  Video  Augmented 
Tracking  System  (VATS)  modification  to  the  PAVE  TACK  pod  to  provide 
automatic  scene  stabilization  for  the  forward-looking  infrared  sensor 
in  that  system  [ 22 ] , [3] , [23 ] . 

Another  application  which  is  being  actively  pursued  is  the  use 
of  range  and  line-of -sight  rate  information  in  lieu  of  a doppler  radar 
to  update  inertial  navigation  system  velocities.  This  requires  very 
stable  tracking  of  available  landmarks.  In  a similar  application,  the 
U.S.  Marine  Corps  A-4M  will  have  a tracker  on  board  to  provide  line- 
of-sight  information  for  an  angle-rate  bombing  system  [23].  The  use  of 
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an  image -Cracking  system  is  being  investigated  to  determine  the 
feasibility  of  using  a director  mechanization  for  air-to-air  gunnery 
[33].  In  this  system,  the  tracker  must  stabilize  the  line-of -sight  to 
the  target. 

2.1*3  Ground-Based  Applications 

Many  ground-based  applications  of  image -tracking  systems  are  similar  to 
the  airborne  and  spaceborne  applications  in  that  they  are  real-time 
target  tracking  applications.  Ground  based  processing  of  satellite 
Imagery  (such  as  earth  resources  data)  is  an  exception.  The 
difficulties  of  image  registration  from  one  satellite  pass  to  another 
have  been  studied  by  Smith  and  Phillips  [34].  This  is  an  area  where 
Improvements  in  nonreal-time  image -tracking  appear  to  be  usable. 

Another  application  for  advanced  tracking  techniques  is  the  estimation 
of  wind  speed  and  direction  at  remote  locations.  This  has  been 
demonstrated  using  satellite  imagery  [18]. 

2.1.4  Constraints  On  System  Mechanization 
The  grouping  of  applications  into  space,  airborne,  and  ground-based 
systems  serves  to  divide  the  applications  by  both  speed  and  equipment 
cost  as  well  as  by  processing  location.  For  the  space  applications,  the 
hardware  costs  will  be  relatively  high,  and  real-time  processing  will 
be  required  for  those  applications  listed. 

The  airborne  systems  in  the  blowaway  category  (missile 
guidance  units)  must  have  a low  unit  cost  and  a high  processing  rate  to 
accomplish  their  task.  It  is  in  this  group  that  suboptimal  performance 
may  be  tolerable  to  obtain  the  low  cost  and  high  speed  required.  The 
airborne  scene  stabilization  systems  can  be  more  costly  than  the 


throwaway  guidance  units  but  must  operate  at  real-tine  rates,  typically 
30  or  60  Images  per  second,  and  need  a lower  value  of  pointing -angle 
noise  to  accomplish  their  task  of  stabilizing  long  focal  length 
telescopes.  The  ground-based  applications  for  satellite  image 
registration  and  range  instrumentation  can  employ  powerful  and 
expensive  large-scale  computers  and  can  operate  at  nonreal-tlme  rates 
to  obtain  maximum  accuracy.  At  the  same  time  these  systems  are  the 
slowest  and  the  most  expensive  of  the  present  image -tracking  systems, 
yet  with  improved  algorithms,  further  research  into  the  underlying 
problems,  and  faster  computers,  these  applications  may  eventually  be 
carried  out  at  real-time  rates. 

2.2  Classical  Similarity  Detection  Techniques 

Basic  image -tracking  techniques  can  be  grouped  into  three 
categories : 

1)  Point-Tracking 

2)  Minimum-Distance-Tracking 

3)  Correlation-Tracking 

2. 2. 1 Point  Tracking 

Point  tracking  proceeds  on  the  assumption  that  objects  to  be  tracked 
possess  characteristics  (features)  which  are  not  present  in  adjacent 
regions  of  the  image.  Under  this  assumption,  a characteristic  feature 
which  identifies  the  target  class  is  detected  via  a feature  extraction 
process  and  used  to  derive  the  target  position  within  the  field  of 
view.  Sequential  images  are  processed  independently  although  adaptive 
feature  extraction  methods  are  usually  employed.  In  their  most 
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elementary  form  feature  extractors  have  binary  valued  outputs,  i.e.  If 
a feature  is  present  at  a point  in  the  image  the  output  of  the  feature 
extractor  is  a one  at  that  point,  while,  if  the  feature  is  not  present 
the  output  is  zero.  Features  which  have  been  used  include  the 
following : 

1)  Intensity  above  a threshold 

2)  Intensity -derivative  magnitude  above  a threshold 

3)  Intensity -derivative  sign  change 

Examples  of  the  results  of  these  feature  extractors  operating  on  one- 
dimensional data  are  illustrated  in  Figure  3. 

For  sirgle-feature  point-trackers,  error  signals  are  computed 
using  either  an  area-balance  or  centroid  algorithm.  The  area-balance 
algorithm  integrates  the  area  of  detected  features  in  each  quadrant  of 
the  gated  region  of  the  data  image  and  forms  an  error  signal  for  each 
axis  by  subtracting  the  resulting  values  in  opposite  halves  of  the 
gated  region. 

The  so-called  "centroid"  algorithm  is  more  often  a f irst -moment 
algorithm  than  a true  centroid  computation.  The  first-moment  of  the 
area  detected  by  the  feature  extractor  is  the  usual  error  signal.  Both 
the  area  balance  and  centroid  error  signal  generating  equations  can  be 
written  in  terms  of  a weighting  function  multiplied  by  the  preprocessed 
data  image  value  and  integrated  over  the  gated  region.  The  horizontal 


and  vertical  error  signals  are 
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Figure  4.  Error  Computations  for  Point  Tracking  Algorithms 
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tdiere  the  weighting  functions  are: 
for  area  balance 


CJ  (x,y)  - sign(x  - x ) 
x cl 


(2.3) 


C J (x,y)  - sign (y  - y ) 
y cl 
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where  slgn(  z ) - 


-1  for  z < 0 

0 for  z - 0 

1 for  z > 0 


and  for  the  centroid  algorithm 


CJ  (x,y)  - x - x 

x cl 
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and  the  notation  is  as  shown  in  Figure  4. 

A common  method  for  using  the  error  signal  in  point  trackers  is 
to  drive  the  gate  within  the  image  to  null  the  error  signal  and  adjust 
the  sensor  pointing  angles  to  center  the  gate  within  the  field  of  view. 
Centroid  trackers  are  quite  susceptible  to  noise  in  the  preprocessed 
data  image  when  the  detected  target  area  is  small  compared  to  the  gated 
area.  Under  this  condition,  a small  region  of  noise  at  the  edge  of  the 
gated  area  can  cause  an  error  of  one  quarter  of  the  gate  dimension 
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because  of  the  relatively  large  magnitude  of  the  moment  contributed  by 
the  noise.  This  is  one  of  the  reasons  why  most  of  today's  centroid 
trackers  are  mechanized  with  adaptive  gates  (Maverick,  GBU-15,  VATS). 
The  gate  size  is  decreased  as  the  detected  target  area  gets  smaller  in 
order  to  reduce  the  tracker's  noise  sensitivity.  A variety  of  point- 
tracker  exists  which  does  not  convert  the  intensity  image  to  a Dinary- 
valued  image  but  uses  the  intensity  values  themselves  to  compute  an 
intensity  centroid.  This  algorithm  is  limited  to  tracking  targets  which 
are  significantly  brighter  (or  darker)  than  their  surroundings.  The 
polarity  of  the  incoming  sensor  data  is  reversed  to  track  dark  targets. 
A primary  advantage  of  this  arrangement  is  that  there  is  no  threshold 
to  be  set  in  the  feature  extractor. 

Point-tracking  algorithms  are  the  basis  for  most  of  the  current 
generation  of  real-time  trackers  for  airborne  applications.  For  these 
applications,  the  input  signals  are  generally  in  the  standard 
television  format  (525  or  875  line  scan  , either  2:1  or  random 
interlace,  30  frames /second ) . The  assumption  that  the  selected  feature 
is  not  present  in  the  adjacent  regions  of  the  image  guarantees  only 
that  the  target  feature  is  bounded  and  not  that  it  is  unique  within  the 
field  of  view.  As  a result,  a subregion  of  the  field  of  view  must  be 
gated  for  use  by  the  tracker.  A large  number  of  different  algorithms 
have  been  developed,  some  with  adaptive  gate  size,  some  with  fixed  gate 
size,  and  some  with  multiple  gates.  Early  algorithms  used  fixed  feature 
extraction  techniques;  later,  adaptive  thresholds  were  employed.  Error 
signal  generation  was  by  analog  area-balance  techniques  for  early 
designs,  but  first  moment  and  true  centroid  algorithms  have  become 
standard  in  the  most  recent  models.  Selection  of  features  (target 
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characteristics)  for  use  in  point-tracking  algorithms  has  received 
substantial  support  from  the  Department  of  Defense.  The  two  most  common 
feature  extraction  processes  are  intensity  thresholding  and  gradient 
thresholding,  combined  with  elementary  forms  of  spectral  and  spatial 
filtering.  These  forms  of  feature  extraction  have  been  selected  in 
large  part  due  to  the  ease  with  which  they  can  be  implemented  with 
analog  signal  processing  techniques. 

2.2.2  Minimum  Distance  Tracking 

In  order  to  talk  about  the  "distance"  between  two  images  as  a measure 
of  their  similarity,  it  is  necessary  to  represent  images  as  elements  of 
a vector  space.  While  two-dimensional  images  are  commonly  visualized  as 
two-dimensional  arrays  or  matrices  of  intensity  values,  they  can  just 
as  easily  be  thought  of  as  vectors.  Thus,  the  reference  image,  Ir,  will 
have  both  a two-dimensional  and  a one-dimensional  representation. 

Where  the  possibility  of  ambiguity  exists,  the  bracketed  notation  will 
be  used  to  indicate  the  matrix  form. 


fir] 


Ir ( 1 , 1 ) . . . lr(l,M  ) 

C 


Ir(M  , 1) . . . Ir (M  ,M  ) 
R R C . 


(2.7) 
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v.iere  M and  M are  the  number  of  elements  in  each  row  and  column 
R C 


respectively. 

A function  D is  called  a distance  function  if  for  a vector 
space  X,  with  xl,  x2,  and  x3  in  X,  D satisfies  the  following  conditions 
[35] : 


D (xl ,x2)  = D (x2,xl) 

D (xl,x3)  < D (xl  ,x2)  +D(x2,x3) 
D(xl,x2)  = 0 if  and  only  if  xl  =x2 
D (xl ,x2)  > 0 


(2.9) 


It  is  possible  to  define  legitimate  distance  functions  in 
unconventional  ways  and  use  these  functions  to  measure  the  degree  of 
similarity  between  two  images.  One  such  distance  function  is  a 
relational  metric  which  depends  not  on  the  intensity  at  each  point  in 
the  image  but  only  on  the  relationship  between  intensity  values  at 
adjacent  points  and  in  adjacent  regions.  By  coding  the  components  of 
the  vector  as  follows 


■ — — T 
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RELATION  BINARY  REPRESENTATION 


< 0 I 

« 0 0 

> 1 0 

the  Hamming  distance  can  be  employed  to  measure  similarity  between 

images  [36] . The  importance  of  this  idea  is  to  recognize  that 
unconventional  distance  functions  may  be  useful  if  they  improve 
performance  or  simplify  computation. 

The  conventional  distance  functions  for  images  are  the  norms  of 
the  difference  image  formed  between  two  image  vectors.  The  norm  of  a 
vector  x will  be  indicated  by  ||x||  and  is  related  to  a specific  inner 
product  by  the  formula 

2 

||x||  “ (x  , x)  (2.10) 

where  (•,.)  denotes  a generalized  inner  product  [35].  The  generalized 
inner  product  of  two  conformable  vectors  is  defined  by  a positive 
definite  Hermitian  matrix  [24].  If  H is  a positive  definite  Hermitian 
matrix,  and  xl  and  x2  are  column  vectors,  the  inner  product  (xl,x2) 
defined  by  H is 


T 

(xl  , x2)  = xl  H x2  (2.11) 

H 

I 


where  T indicates  the  transpose  of  the  vector.  If  H = I,  the  identity 

$ 
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matrix,  then  is  the  familiar  vector  dot  product.  Minimum-norm 

tracking  seeks  to  minimize  the  norm  of  a difference  image  formed  by 
subtracting  the  data  image  from  the  reference  image  on  a point-by-point 
basis.  The  two  most  common  norms  are 


1 


(2.12) 


and 


2 


* i 


1 

2 


(2.13) 


These  correspond  to  the  Minkowski  norms  of  order  one  and  tv?o, 
respectively  [35] . Henceforth,  when  a norm  is  used  without  a 
subscript,  it  will  be  understood  to  be  the  Minkowski  norm  of  order  two 
with  uniform  weights  (H  = I). 

Barnea  and  Silverman  have  developed  a class  of  minimum  distance 
algorithms  for  fast  digital  image  registration  called  Sequential 
Similarity  Detection  Algorithms  (SSDA's)  [8] . These  algorithms  allow 
the  use  of  any  distance  function  which  can  be  defined  at  each  point  in 
an  image  pair,  and  under  conditions  of  high  s ignal-to-noise  ratios  for 
the  imagery  being  processed,  require  considerably  less  computation  to 
find  a minimum  than  the  exhaustive  search  techniques  used  previously. 
Webber  has  described  techniques  for  setting  the  SSDA  threshold  [38] . 


» 

i 
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2.2.3 


Correlation  Tracking 


Correlation  trackers  seek  the  location  of  the  subset  of  the  data  image 

which  maximizes  either  the  cross  correlation  function  or  the  normalized 

cross  correlation  function  between  a reference  image  and  that 

particular  sub-image.  Let  Ir  (i,j)  be  an  L by  L reference  image,  and 

L 

let  Id(i, j)  be  an  M by  M data  image.  The  elements  of  the  unnormalized 
R C 


cross  correlation  surface  R (u,v)  are  defined  to  be 

rd 

L L 


R (u,v) 
rd 


i-1  i-l 


Ir  (i.j)Id(i-(u,  j+v) 
L 


(2.14) 


1 < u < M +1-L 
R 

1 < v < M +1-L 
C 


By  finding  the  (u,v)  which  maximizes  this  function,  the  translational 
registration  error  is  determined  [8].  Normalization  is  accomplished 
by  dividing  this  function  by  the  product  of  the  autocorrelation 
functions  of  the  reference  image  and  the  data  subimage.  The  normalized 
cross  correlation  surface  is  defined  by 


» 
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(2.15) 


t 


R (u,v)  - 
rd  N 


4_. 

1-1 


L, 

j-i 


Ir  (I,  j)Id(i+u,  j+v) 
L 


L L 

L L 

IE™2 

i»l  j-1  L 

XI  Xj  Id(i+u, j+v) 
i-1  j-1 

These  expressions  can  be  rewritten  as  inner  products  in  the  following 
form: 


R (u,v)  - ( Ir , Id  ) (2.16) 

rd  u,  v 


2 

( Ir, Id  ) 
u,  v 

R (u,v)  - (2.17) 

rd  N 

(Ir.Ir)  (Id  ,Id  ) 

U,  V U, V 


where  Id  (i,j)  * Id(u+i,v+j),  and  Ir  is  a suitable  reference  image. 
u,v 


Various  preprocessing  techniques  have  been  investigated  to 
enhance  the  raw  imagery  prior  to  performing  the  cross  correlation. 
Hayes  has  suggested  that  thresholding,  Laplacian  enhancement,  edge 
enhancement,  and  neighborhood  averaging  may  be  appropriate  techniques 
to  apply  depending  on  the  sensor  used  to  obtain  the  imagery  and  the 
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Imagery  itself  [13].  Cross  correlation  between  transformed  images  has 
also  been  demonstrated  with  inversion  [13],  Fourier  transformation 
(magnitude  and  phase  correlation)  [1],  marginal  summation  [41],  and 
phase  correlation  on  the  marginal  sums  [11].  Pratt  has  shown  that  the 
peak  of  the  statistical  correlation  measure  can  be  appreciably 
sharpened  by  application  of  linear  spatial  preprocessing  [29]. 

2. 3 Summary 

In  this  chapter  an  historical  summary  of  image  tracking 
applications  and  similarity  detection  techniques  was  presented.  In  the 
following  chapters  an  appropriate  signal -to -noise  ratio  will  be 
developed  for  the  image  tracking  task,  and  four  new  techniques  will  be 
introduced  to  increase  the  effective  tracking  s ignal-to-noise  ratio. 


Cliapter  3 

SIMILARITY  DETECTION 

In  this  chapter,  the  relationship  between  minimum  norm  and 
maximum  cross  correlation  techniques  for  similarity  detection  is  shown, 
the  characteristics  of  the  minimum  norm  distance  function  as  applied  to 
a difference  image  are  investigated,  and  a s ignal-to-noise  ratio  which 
is  applicable  to  the  minimum  distance  similarity  detection  problem  is 
developed.  In  Chapter  4,  s ignal-to-noise  enhancement  techniques  are 
developed,  and  in  Chapter  6 these  techniques  are  applied  to  an 
integrated  tracking  algorithm. 

3.1  Minimum  Norm  vs  Maximum  Cross  Correlation 

Similarity  detection  via  minimization  of  the  norm  of  the 
difference  between  two  images  is  equivalent  to  maximizing  the  cross 
correlation  function  between  the  same  images  under  a restricting  set  of 
assumptions.  To  see  this,  express  the  square  of  the  norm  of  the 
difference  between  two  images  as  an  inner  product. 

2 

| | Ir-Id | | = (Ir-Id,Ir-Id)  (3.1) 

Expanding  this  expression  yields 

2 

I | Ir  - I d | | = (Ir.lr)  - (Ir,Id)  - (Id.lr)  + (Id, Id)  (3.2) 


* 

it 


and  since  the  inage  components  are  real  valued,  the  conjugate  symmetry 
of  the  inner  product  allows  us  to  write  this  as 

2 

||lr  - X d 1 | = (Ir.lr)  + (Id, Id)  - 2(Ir,Td)  (3.3) 

Since  Ir  is  the  reference  image,  it  is  a constant  for  all  trial 
data  images.  The  last  term  on  the  right-hand  side  of  (3-3)  is  the 
cross  correlation  between  Ir  and  Id  (see  Section  2.2.3).  From  this  we 
observe  that  if  (Id, Id)  is  a constant,  then  minimizing  the  norm  of  the 
difference  image  is  equivalent  to  maximizing  the  cross  correlation 
function  between  Ir  and  Id.  When  (Id, Id)  is  a constant,  the  procedures 
which  will  be  developed  in  the  following  sections  with  respect  to 
minimum  norm  algorithms  will  produce  results  which  are  equivalent  to 
maximum  cross  correlation  tracking  algorithms.  In  all  other  cases, 
while  results  may  be  similar,  no  guarantee  is  made  about  their 
equivalence. 

All  of  the  tracking  algorithms  to  be  investigated  will  be 
minimum  norm  algorithms.  Specifically,  with  the  exception  of  the  non- 
uniformly  weighted  norm  which  will  be  developed  in  Section  4.1  all 
norms  will  be  uniformly  weighted  Minkowski  norms  of  order  two. 

3.2  Notation 

In  order  to  facilitate  precise  descriptions  and  a compact 
notation,  the  following  list  of  terms  and  symbols  will  be  useful: 
UNDERLYING  IMAGE,  Is  - A perfect  representation  of  the  scene  being 
viewed  by  the  sensor.  Both  Ir  and  Id  are  noise  corrupted 
versions  of  Is. 
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REFERKNCE  INDEX  SET,  L - An  Nx2  index  set  specifying  the  row  and  column 
associated  with  a particular  element  of  the  reference  set.  If 
the  reference  set  was  always  going  to  be  a contiguous  block  of 
pixels  of  a fixed  size,  then  the  location  of  one  corner  and  the 
length  of  each  side  would  suffice  to  identify  all  of  the  pixels 
in  the  reference  set.  Historically  this  has  been  the 
configuration  for  the  reference  set  [40] , [8]  . In  Section  4.3 
we  will  see  that  the  effective  s ignal-to-noise  ratio  is 
increased  by  selecting  pixels  from  high  3ignal  regions  of  the 
image  for  inclusion  in  the  reference  set.  Because  the  pixels 
in  the  reference  set  may  be  spread  out  over  the  whole  image, 
the  reference  index  set  is  required  to  keep  track  of  the  row 
and  column  associated  with  each  included  pixel. 

L(i,j) 

1 < i < N 

j - 1,2 


REFERENCE  SET  - The  subset  of  the  reference  image  indexed  by  the 
reference  index  set. 


Ir (L(i, 1) ,L(i, 2)) 

1 < i S N 

DIFFERENCE  IMAGE,  D - The  image  formed  by  shifting  Id(.,.)  with  respect 
to  Ir(.,.)  and  taking  the  point-by-point  difference.  The 
difference  image  is  not  defined  where  Ir  and  the  shifted  Id  do 
not  overlap. 
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D(dy,dx,i,  j) 


-dy 

* dy 

< dy 

MAX 

MAX 

-dx 

< dx 

$ dx 

MAX 

MAX 

max[l,dy]  $ i $ rain[M  ,M  -dy] 


max[l,dx]  < j < min [M  ,M  -dy] 


COMPARISON  SET,  Ic  - The  set  of  difference-image  elements  specified  by 
the  reference  index  set. 


Ic(dy,dx,i)  = D(dy,dx,L(i,l),L(i,2))  (3-4) 

1 < i < N 

-dy  < dy  < dy 

MAX  MAX 

-dx  < dx  < dx 

MAX  MAX 


TRIAL  REGISTRATION  - The  relative  shift  of  the  data  image  with  respect 
to  the  positions  of  the  picture  elements  as  received  from  the 
sensor.  Each  trial  registration  produces  a different  comparison 
set.  The  trial  registration  which  produces  the  comparison  set 
with  minimum  norm  will  be  denoted  (dy,dx),  otherwise  the  trial 
registration  will  be  denoted  (dy,dx). 


3.3  Assumptions 


A 


It  is  assumed  that: 

1)  Id  differs  from  Ir  by  a translational 
misregistration  and  additive  zero-mean  noise  in  each 
image 

Id(i,j)  - n (i.j)  **  I r(i-t^y,  j+<ix)-n  (i+(iy,  j+<ix) 
d r 

(3-5) 

2)  The  noise  components  of  I r and  Id  are 
uncorrelated 


E (n  (i,j)n  (i+dy,j+dx)]  = 0 (3-6) 

d r 


where  n (...)  is  the  zero  mean  noise  associated  with  the 
r 

reference  image 

n (...)  is  the  zero  mean  noise  associated  with  the  data 
d 

image,  and 

E(.)  is  the  expected  value  operator. 

Both  of  these  assumptions  will  hold  when  the  principle  source  of  noise 
is  electronic  shot  noise,  and  the  average  scene  illumination  changes 
slowly  with  respect  to  the  frame  rate. 


3.4  Distance  Function 


2 

The  distance  function  d (dy.dx)  is  defined  as  the  weighted  norm 

A 


i 


of  Ic(dy,dx,. ) 


9 


2 T 

d (dy,dx)  = Ic(dy,dx,.)  A Ic(dy,dx,.) 
A 


-I.  X 

i-1  ii 


c(dy,dx,i  ) 


(3.7) 


where  A is  a diagonal  matrix  of  positive  weighting  factors  wJ th 
elements  a . When  A is  the  identity  matrix,  we  will  use  the  notation 

ij 


d (dy,dx)  to  indicate  the  specific  case  of  equal  weights  for  all 
I 

components  of  Ic(dy,dx,.). 


3.5  Tracking  Algorithm 


The  tracking  algorithm  will  compute  d (dy,dx)  for  a range  of  dy 

A 

and  dx  values,  and  select  the  (dy,dx)  which  corresponds  to  the  minimum 


value  of  d (dy,dx)  as  the  relative  translation  of  Id  with  respect  to 
A 


Ir.  The  range  of  values  for  dy  and  dx  specifies  the  size  of  the  search 

region.  All  search  regions  will  be  treated  as  being  symmetric  with 

-dy  < dy  < dy  and  -dx  ^ dx  $ dx  . This  is  based  on  the 
MAX  MAX  MAX  MAX 

assumptions  that  the  search  is  centered  on  the  roost  likely  location  for 

the  image,  and  that  the  distribution  of  errors  is  symmetric  about  that 

location. 
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3.6 


Noise  Characteristics 


The  reference  image  lr  is  of  course  not  a perfect 
representation  of  the  underlying  scene  being  viewed  by  the  sensor.  If 
Ir  is  obtained  directly  from  the  sequence  of  data  images,  it  contains 
noise  with  the  same  characteristics  as  all  of  the  other  data  images. 

If  the  reference  image  is  obtained  by  filtering  the  sequence  of  data 
images,  then  the  noise  component  of  Ir  will  differ  from  that  of  the  raw 
data  images.  The  variance  of  the  reference  image  noise  component 
associated  with  the  image  location  Ir(i,j)  will  be  denoted  by 


<T  (i,  j)  , that  is 
n.REF 


2 2 

or  (i,j)  = El  n (i,j)  ] 
n,REF  r 


(3.8) 


The  noise  variance  associated  with  the  (i,j)  coordinate  of  the 


data  image  Id  is  <r  (i,j) 

n .DATA 


2 2 

cr  (i,j)  = E [n  (i,j)] 

n,  DATA  d 


(3.9) 


For  sensors  with  single  channel  outputs  where  thermal  noise  is 

the  dominant  component  of  n , we  will  assume  that  the  noise  is  ergodic 

d 

and  hence  has  stationary  statistics  [27]  . Vidicons,  laser  line 
scanning  systems  with  single  detectors,  and  certain  infrared  scanners 
fall  into  this  class  of  imaging  sensor  [39] . In  addition,  since  the 


sensor  output  is  generally  low  pass  filtered  prior  to  digitization,  a 
first  order  Markov  model  will  be  used  for  the  sensor  noise  statistics. 
In  Section  3.8  a model  for  imagery  will  be  developed  and  the  Markov 
nature  of  the  sensor  noise  will  be  seen. 

Stationary  noise  statistics  allow  us  to  eliminate  the  spatial 

2 

specificity  of  the  noise  variances  and  simply  write  them  as  <y  and 

n,REF 

2 

. Spatial  and  temporal  ergodicity  will  allow  us  to  estimate  the 

n , DATA 

noise  variance  at  an  arbitrary  point  in  the  image  from  the  sample 
variance  of  the  noise  over  the  whole  image  (note  that  for  scanned 
sensors  spatial  and  temporal  ergodicity  arc  equivalent). 
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3.7 


Characteristics  of  the  Auto-Distance  Function 


The  auto-distance  function  is  the  two-dimensional  distance 

2 

function  which  results  from  computing  d (dy,dx)  between  an  image  and  a 

A 

translated  copy  of  the  same  image.  The  auto-distance  function  has 
characteristics  which  are  determined  by  the  statistical  signal  and 
noise  properties  of  the  image  itself. 

Consider  the  case  of  shifting  the  reference  image  with  respect 
to  itself  in  a direction  parallel  to  a scan  line  from  an  initially 

2 

registered  position.  At  registration,  the  distance  d (0,0)  is  zero 

I 

since  all  of  the  elements  of  the  comparison  set  are  zero. 

Ic  (0 , 0,i  ) = D[0,0,L(i,l),L(i,2)] 

= Ir  [L(i,  1)  ,L  (i,  2))  - Ir  [L(i,  1 )+0,L  (i , 2)+0) 

- 0 (3.10) 

2 

For  a shift  of  one  picture  element,  or  pixel,  d (0,1)  has  contributions 

I 

from  both  signal  and  noise  components. 

Ic(0,l,i)  - Is  [L(i,l),L(i,2)]+n  (L  (i,  1 ) ,L  (i,  2)] 

r 

-Is [L(i, 1 ) ,L (i, 2)+l]-n  [L  (i , 1 ) ,L (i , 2)+l ) (3.11) 

r 
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Let 


m (dy,dx)  - Is [L(i, 1 )+dy, L(i, 2)+dx]-Is [L(i, 1 ) ,L(i, 2)]  (3.12) 

1 


then 


2 

d (dy, dx)  - 
A 


N 


m (dy,dx)+n  [L(i, 1) ,L(i,2)J 
i r 


-n  [L(i,l)+dy,L(i,2)+dx] 
r 


2 


(3.13) 


2 

The’  signal  component  of  d (dy.dx)  Is  the  contribution  from 

A 

{ m (dy.dx)  j.  The  noise  terms  represent  the  difference  between  the 

noise  at  tvo  pixel  locations  separated  from  each  other  by  a translation 
of  (dy.dx).  For  line  scanned  imagery,  the  noise  will  be  modeled  as  a 
Markov  process  in  time;  thus  the  noise  will  be  correlated  with  itself 
much  more  strongly  in  the  direction  of  scan  than  in  the  direction 
perpendicular  to  the  scan. 

Taking  the  expected  value  of  (3.13)  yields 
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d (dy, dx) 
I 


m (dy.dx)  +2  CT  [1-  p(dy.dx)] 

N ,AVE  n.RF.F  ' 


(3.16) 


where 


11 

i y 2 

— n ( d y , < 


in  (dy,dx)  ** — /L_i  m (dy.dx) 

N.AVE  N i=l  i 


(3.17) 


Let  j (dy.dx)  be  the  scan  time  difference  between  two  pixels 


separated  by  (dy.dx) 


T(dy.dx)  = y dy  + y dx 

y x 


(3.18) 


where  is  the  time  that  the  sensor  takes  to  scan  one  entire  line  and 

y 

J is  the  scan  time  between  two  adjacent  pixels  in  the  same  line. 


For  a first  order  Markov  process  (see  Section  3-6)  the 
normalized  autocorrelation  function  has  the  form  [12] 


4>(T) 


-«|r  I 


(3.19) 


where  1 /ot  is  the  correlation  time  of  the  process.  The  correlation 
coefficient  jXdy.dx)  in  (3.16)  is  the  value  of  the  normalized 
autocorrelation  function  of  the  noise  when  the  time  delay  is  the  scan 
time  difference  between  two  pixels  separated  by  (dy.dx).  Substituting 
(3.18)  into  (3.19)  yields  an  expression  for  the  correlation  coefficient 
in  terms  of  dy  and  dx. 


— >fii 


j?(dy, dx)  - exp  [ -cxj^dy  + y dx|  ] 


(3.20) 


For  a minimum  resolution  TV  compatible  scanner  with  256 

pixels /horizontal  line  (y  > 256^")  and  a correlation  time  of  a few 

y x 

pixels  (say  l/ex.  < 4J*)*  it  is  clear  that  j>(  dy , dx)  is  approximately  zero 


for  dy  9*  0. 


ly.dx) 


* 1 
< exp 

. ° 


256  y dy  + y dx 


< exp  [ — 64 J dy J - - 25 J dx  j ] 


(3.21) 


Thus  we  can  assume  that  the  noise  component  of  line  scanned  imagery  is 
uncorrelated  between  pixels  which  are  adjacent  to  each  other  in  a 
direction  perpendicular  to  the  direction  of  scan,  but  we  must  take  into 
account  the  correlation  which  exists  between  pixels  that  are  adjacent 
to  each  other  in  the  direction  of  scan. 


3.8  A Facet  Model  for  Imagery 


In  order  to  do  anything  useful  with  d (dy,dx),  we  need  an  image 

A 

model  with  tractable  characteristics  for  m(dy,dx).  The  model  that  we 
will  use  assumes  that  the  image  Intensity  is  a well-behaved  function 
that  can  be  approximated  by  a local  tangent  plane  (facet)  in  the 
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vicinity  of  each  pixel  coordinate.  The  resulting  expression  for 
Is(i-Kly, j+dx)  is  a truncated  Taylor  series  in  two  variables 


Is(i-hiy,  j+dx)  = Is(i,j)  + dy  VIs(i,  j)  • 1 


+ dx  VIs(i, j) • 1 


(3.22) 


where  V is  a discrete  gradient  operator  defined  by 


Is(i+l,j)  - Is(i-l.j)  ^ 

Vls(i.j)  1 

2 y 


Is(,j+1)  - Is(i, j-I) 


(3.23) 


and  1 and  1 are  unit  vectors  parallel  to  the  x and  y axes 
x y 

respect! vel y. 

For  L(k,l)-i  and  L(k,2)-j  the  expression  for  m (dy,dx)  is 

k 


m (dy,dx)  - Vls(i , j) •( dy  1 + dx  1 ) 

k y x 


(3.24) 


and  substituting  (3.24)  into  (3.17)  the  general  expression  for 


m (dy,dx)  is 
N.AVE 


m (dy 
N.AVE 


N r 

»dx)-  —X!  p 
N i-1  L 


<7Is[L(i,l),L(i,2)Mdy  1 + dx  1 ) 

y x 


] 


(3.25) 


Figure  5 shows  the  expected  value  of  the  normalized  auto- 


distance function  for  a signal-free  image  (m  (dy.dx)-O)  with  a 

N.AVE 


noise  variance  of  <7"«3  and  a noise-correlation  coefficient  of  the  form 
n 


? 


exp  (-.7  dx|]  for  dy  «*  0 


(dy.dx)  » < 


(3.26) 


otherwise 


Figure  6 shows  the  value  of  the  auto-distance  function  for  a noise- 


free  image  (CT-0)  with  average  signal  strength 
n 


2 2 2 
m (dy,dx)  - 1.66  dy  + . 16  dx 
N.AVE 


(3.27) 


where  the  coefficients  are  adjusted  to  match  the  empirical  data  listed 
in  Table  1.  Figure  7 shows  the  expected  value  of  the  auto-distance 
function  for  an  image  which  combines  the  noise  characteristics  from 
Figure  5 and  the  signal  characteristics  from  Figure  6. 

To  demonstrate  the  facet  image  model  by  comparison  with 
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figure  6.  Auto-DlsUanco  Function  for  n Simulated 
Noise-Free  Image 
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experimental  data.  Table  I lists  the  values  of  the  auto-distance 
function  for  a small  subimagc  taken  from  a irame  of  television  imagery 
(see  Appendix  X for  source  daca).  Figure  8 shows  the  plotted 
values  for  dy  = 0 and  dx  = 0. 

Note  the  markedly  smaller  signal  component  in  the  direction 
parallel  to  the  axis  of  the  scan  (x-axis)  and  the  similarity  between 
Figure  7 and  Figure  8.  The  differential  signal  strength  in  the  x and  y 
directions  is  attributable  to  a high  digitizer  sampling  rate  relative 
to  the  bandwidth  of  the  output  signal  from  the  sensor.  The  similarity 
of  Figure  7 and  Figure  8 demonstrates  the  validity  of  the  facet  model 
and  the  Markov  nature  of  the  noise. 

3.9  Cross-Distance  Function 

2 

When  d (dy,dx)  is  used  to  compare  a reference  image  Ir  with 
I 

2 2 

noise  variance  O'  to  a data  image  Id  with  noise  variance  <r  , 

n,REF  n.DATA 

the  resulting  two-dimensional  distance  function  will  be  called  the 

cross-distance  function.  The  cross-distance  function  is  biased  with 

respect  to  the  auto-distance  function  by  an  amount  equal  to  the  sum  of 

the  noise  variances  from  the  reference  image  and  the  data  image.  For  a 

correctly  registered  data  image 


48.98  48.77  48.64  48.61  48.30  47.50  47.05  46.29  45.20  44.09  43.58 


I 


I 


t 


2 2 

■=  <r  + cr  (3.28) 
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and  for  a relative  translation  of  (dy,dx) 


E 


2 

d (dy, dx) 
I 


N 


2 2 2 

rn  (dy,dx)  +0"  + O' 

N.AVE  n,REF  n.DATA 


(3-29) 


Table  It  lists  the  values  of  the  cross-distance  function 
between  two  successive  frames  of  actual  T.V.  imagery,  and  Figure  9 
shows  the  plotted  values  for  dy  = 0 and  dx  = 1 . In  this  case,  the 
minimum  distance  match  occurs  at  a translation  of  (0,1).  It  is  the 
location  of  this  minimum  distance  coordinate  that  the  tracking 
algorithm  declares  to  be  the  present  position  of  the  current  reference 
image.  If  the  location  of  the  minimum  distance  coordinates  change  on  a 
frame-to-f rame  basis,  the  change  is  interpreted  to  be  either  sensor 
motion  or  image  motion. 

In  the  next  two  sections  we  will  develop  an  expression  for  the 
probability  of  making  a particular  error  in  determining  the  correct 
registration  of  a data  image  with  respect  to  the  reference  image, 
propose  a natural  "signal-to-noise"  ratio  for  the  minimum  norm  tracking 
problem  and  show  the  relationship  between  this  signal-to-noise  ratio 
and  the  probability  of  error. 
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f ^ m 7^1 

3-10  Distance  Function  Statistics 


Since  each  component  of  Ic  is  the  difference  between  two  noise- 
corrupted  image  intensity  values,  Ic  is  a random  vector.  The  mean 
th 

value  of  the  i component  is 


E [Ic(dy , dx, i ) ) = -m  (dy,dx)  (3.30) 


and  the  variance  is 


var  (Ic(dy,dx,i)]  = var(n  ) + varfn  ] 

r d 


2 2 

= CT  +0"  (3.31 ) 
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If  n and  n are  Gaussian  random  variables  (we  will  see  later 
r d 

that  this  assumption  is  in  good  agreement  with  experimental  data)  then 

2 

d (dy , dx ) 

I 


2 2 

cr  + o~ 

n,REF  n.DATA 


is  a noncentral  chi-square  distributed  random  variable  with 
noncentrality  parameter  0 and  N degrees  of  freedom  where  [16] 


2 2 

<t  + cr 

n,REF  n,DATA 


For  (dy, dx)  = ( 0 , 0)  the  resulting  distribution  is  central  chi-square  with 
N degrees  of  freedom. 

For  each  (dy,dx),  there  exists  the  possibility  that  as  a result 
of  noise  in  both  the  reference  image  and  the  data  image,  the  distance 
function  value  at  the  correct  registration  is  greater  than  the  distance 
function  value  at  an  incorrect  registration.  For  a correctly 
registered  data  image  the  tracking  algorithm  will  make  an  error  any 
time  that  there  is  some  (dy',dx')  such  that 


2 2 

d (0,0)-d  (dy'.dx')  < 0 (3.33) 

I I 


r 


iL. 


2 2 2 

cr  = cr  + cr 
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and  define  two  normalized  random  variables  U and  V 


(3.34) 


d (dy, dx) 
I 


U (dy , dx ) 


(3.35) 


and 


d (0,0) 
I 


(3.36) 


In  general  U and  V are  not  independent  random  variables  since  it  is 
possible  to  have  a reference  index  set  with 


[L  (i,  1 ) ,L  (i,  2)]  = [L(J,l)+dy,L(J,2)+dx] 


(3-37) 


for  some  legitimate  set  of  (i,j,dy,dx).  In  fact,  this  case 

predominates  when  the  reference  set  is  composed  of  a contiguous  block 

of  pixels.  Under  this  condition,  a noise  sample  may  contribute  to  both 

2 2 2 

d (0,0)  and  d (dy,dx).  The  only  way  to  ensure  that  d (0,0)  and 
I I I 
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2 

d (dy,dx)  are  independent  is  to  require  that  for  all  i,  j < N 
I 


L(i,l)  - L(j,l)  > dy  (3.38) 

1 MAX 


and 


L(i,2)  - L(j,2)  I > dy  (3.39) 

MAX 


Figure  10  illustrates  the  spacing  required  to  ensure  that  U and  V are 

uncorrelated  when  the  noise  samples  are  independent  in  a uniformly 

spaced  reference  i et  with  dy  = dx  - 5 and  N - 64. 

MAX  MAX 

Let*P  (dy.dx)  denote  the  probability  that  for  a correctly 


2 2 

registered  data  image  d (0,0)  is  less  than  d (dy,dx) 

A A 


2 2 

P (dy.dx)  - P [d  (0,0)  - d (dy.dx)  < 0]  (3.40) 

c A A 


P (dy,dx)  is  the  probability  of  being  correct  with  respect  to  the 
c 

decision  on  whether  the  reference  image  is  registered  at  (0,0)  or  at 
(dy,dx).  The  probability  of  error,  P^  (dy.dx)  is  the  complement  of 

P (dy.dx) 
c 


P.  (dy.dx)  - 1 - P (dy.dx) 
£ c 


(3.41) 
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Let 


W(dy,dx)  - V - U (dy,dx) 


(3.42) 


For  large  values  of  N,  both  U and  V tend  toward  a normal  distribution 
[16]  . We  will  normalize  W(dy,dx)  to  zero  mean  and  unit  variance  by  the 
transformation 


X (dy , dx) 


W(dy,dx)-E[w(dy>dx)] 

J var^W  (dy,  dx)j 


(3.43) 


and  for  large  N,  approximate  the  probability  distribution  function  of  X 
by  the  unit  normal  distribution.  Thus 


P (dy,dx)  - P[W(dy,dx)  < 0] 
C 


-e|*W  (dy,  dx)] 
^var^W  (dy,dx)J 


A , 

- erf 


£ (dy, dx) 
N 


(3-44) 


* 

where  erf  ( .)  is  defined  by 


(3.45) 
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We  now  derive  the  expressions  for  the  argument  of  erf  (•)  in  terms  of  N 
and  © , the  parameters  of  the  noncentral  chi-square  distributions 


associated  with  U and  V. 

E [W(dy,  dx)]  = E [V]  - E[U(dy,dx)] 


= N - [N  J-  ©(dy.dx)] 


= - 0(dy,dx) 


var [W(dy , dx)]  = var[V  - U (dy.dx)] 


2 2 

- E[V  ] + E [U  (dy.dx)] 

2 

+ © (dy , dx)  + 2 ©(dy, dx) E [V] 

-2  E[VU  (dy.dx)] 

“2  ©(dy,dx)E[U  (dy.dx)] 


E [V]  «=  N 


2 2 

E [V  ] - N + 2N 


(3.46) 


(3.47) 


(3.48) 


(3.49) 


E [U  (dy , dx  ) J = N + @(dy,dx) 


2 2 
E[U  (dy , dx )]  = [N  + ©(dy,  dx)]  + 2 [N  + 2 ©(dy.dx)] 


E [VU  (dy,  dx)]  = (T  CT  p + E [V]  E [U  (dy,  dx)] 

V U (dy,  dx ) / VU 


where  p is  the  correlation  coefficient  between  V and  U(dy,dx). 
' VU 


Substituting  (3. 48)  through  (3-52)  into  (3-47)  yields 


var  [W(dy,dx)]  = 4N  + 4©(dy,dx) 


Let 


2 

m (dy.dx) 

2 4 N.AVE 

Y (dy.dx)  — 

N.AVE  2 

(T 

n 


(3.55) 


It  appears  that  X (dy, dx)  is  the  natural  signal-to-noise  ratio  for 
N.AVE 


this  minimum-norn  detection  problem.  Notice  that  Y (dy.dx)  is 

N.AVE 

different  for  each  (dy.dx).  This  signal-to-noise  ratio  is  a two 
dimensional  function. 

An  upper  bound  can  be  established  for  P (dy.dx),  the 


probability  of  a particular  error,  by  letting  = 0 in  (3-54).  Then 


VU 


n r 


£,  (dy.dx)- 
N 


N.AVE 


2/1+  Y 

N.AVE 


(3.56) 


and 


P (dy.dx)  < 1 - erf  [ £ (dy.dx)] 
C N 


(3.57) 
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2 

The  upper  bound  is  a function  of  'f  and  N only.  Figure  11  shows 

N.AVE 

2 

how  the  maximum  probability  of  error  varies  with  y for  various 

N.AVE 

values  of  N. 


3.11  Experimental  Probability  of  Error  Determination 

In  order  to  assess  the  accuracy  of  the  probability  of  error 
bound  of  (3-57)  and  to  investigate  the  improvement  that  might  be 
obtained  when  the  noise  component  of  the  data  image  is  correlated 
between  trial  registrations,  a series  of  simulations  was  performed. 

For  each  simulation,  the  reference  image  was  noise-free  and  noisy  data 
images  were  generated  by  adding  artificial  Gaussian  noise  with  variance 
2 

O'  to  a copy  of  the  reference  image.  Since  the  reference  image  is 
n 

assumed  "perfect",  the  auto-distance  function  provided  a direct  measure 
2 

of  m (dy,dx).  Thus,  for  any  reference  index  set  and  any  trial 
AVE 

registration,  the  s ignal-to-noise  ratio  was  known.  The  probability  of 
error  was  estimated  by  counting  the  fraction  of  the  total  number  of 
2 2 

trials  on  which  d (dy,dx)  was  less  than  d (0,0).  Figure  12  and  Figure 
I I 

13  present  the  results  of  two  Monte  Carlo  simulation  runs  of  100  trials 
each.  The  100  tr  als  provide  95%  confidence  intervals  ranging  from+.l 
+ .05 

for  P *=.5  to  for  P near  0 [14]. 

6 -.001  £ 
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In  each  case  the  reference  index  Jet  spacing  was  selected  in 


2 2 

such  u way  as  to  make  d (dy,dx)  independent  of  d (0,0)  for  the  points 

I I 

plotted.  Figure  14  illustrates  the  decrease  in  probability  of  error 
2 2 

that  occurs  when  d (dy,dx)  and  d (0,0)  are  correlated.  The  reference 
I I 

set  in  this  case  was  a contiguous  block  of  pixels.  It  is  clear  from 
these  results  that  correlation  between  correctly  registered  and 
misregistered  distance  function  values  may  be  exploited  to  reduce  the 
probability  of  error  below  the  upper  bound  established  by  (3.57). 

3-12  Summary 

2 2 2 

Since  (dy,dx)  depends  on  the  sum  of  0“  and  (J~  , 
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the  probability  of  error  can  be  reduced  by  reducing  either  of  these  two 

quantities.  A trade-off  will  generally  be  necessary  with  respect  to 

2 

reductions  in  0~  for  line  scanning  sensors.  Inserting  a low  pass 

n.DATA 

filter  into  the  sensor  output  will  reduce  the  noise  variance  after 

digitization  but  will  also  increase  the  correlation  time  for  the  noise 

and  decrease  the  bandwidth  of  the  sensor.  The  effect  of  reduced  sensor 

bandwidth  is  to  decrease  the  magnitude  of  m (0,dx)  from  what  it  would 

i 

be  without  a low-pass  filter.  While  it  is  clearly  necessary  to  provide 
some  low-pass  filtering  of  the  sensor  output  to  suppress  aliasing,  the 
decision  on  whether  to  reduce  the  bandwidth  of  the  sensor  any  further 
Bust  be  based  on  the  anticipated  scene  content  and  sensor  noise. 


Filtering  can  also  be  performed  after  sampling  and/or 


digitization.  In  Section  A .2  we  will  discuss  a promising  nonlinear 
technique . 

2 

The  reduction  of  Q~  through  processing  of  the  raw  data 

n,REF 

images  to  form  a low-noise  reference  image  must  be  viewed  as  a strong 
candidate.  The  objective  is  to  register  the  sequence  of  incoming  data 
images  and  filter  the  time  series  presented  at  each  pixel  location  to 
form  an  estimate  of  the  intensity  at  the  corresponding  point  in  the 
underlying  image.  Filtering  on  a f rane-to-f rame  basis  preserves  the 
resolution  of  the  raw  sensor  data  and  reduces  the  noise  level  of  the 
reference  image  by  averaging  the  noise  over  a number  of  frames.  It 
also  incorporates  any  changes  in  the  underlying  image  into  the 
reference  image,  though  necessarily  with  some  delay.  Chapter  5 will 
deal  with  this  subject  in  greater  detail. 

2 2 

The  only  remaining  variable  to  affect  T is  m (dy,dx), 

N,AVE  AVE 

the  square  of  the  average  gradient  of  the  image  in  the  neighborhood  of 
the  set  of  pixels  which  comprise  the  comparison  set.  It  is  clearly 
possible  to  select  the  set  of  pixels  to  be  included  in  the  comparison 
set  based  on  the  gradient.  In  Section  A. 3 we  will  discuss  an  approach 
to  this  selection  process  based  on  a new  gradient  magnitude  estimation 
algorithm. 

An  alternate  approach  to  maximizing  P (dy,dx)  is  to  find  an 

c 

appropriate  weighting  matrix  A which  weights  the  "better"  components  of 
Ic  more  heavily.  Section  A . 1 contains  the  development  and  discussion 
of  this  approach. 
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Chapter  4 


SIGNAL -TO-NOISE  ENHANCEMENT  TECHNIQUES 


In  this  chapter  three  new  techniques  are  developed  to  reduce 
the  probability  that  the  tracking  algorithm  will  make  a registration 
error.  Each  technique  is  demonstrated  and  two  of  them  are  included  in 
the  integrated  tracking  algorithm  which  is  developed  and  evaluated  in 
Chapter  6. 


4.1  Nonuniformly  Weighted  Norm 

In  the  previous  chapter,  we  dealt  with  the  case  of  uniform 

weights  for  each  component  of  Ic.  In  this  section,  we  will  derive  an 

expression  for  the  weights  a which  maximizes  the  lower  bound  on 

ii 

2 

P (dy,dx)  for  a given  set  of  {m  (dy,dx)}  and  noise  variance  O' . 

C i n 

Let 

2 2 

d (0,0)  - d (dy,dx) 

a A A 

Y(dy,dx)  (4.1) 

2 

<r 

n 


and  normalize  Y(dy,dx)  to  zero  mea..  and  unit  variance  with  the 
transformation 


t 
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(4-2) 


1 

' 2 

2 ' 

E [ Y (dy , dx  ) ] = 1 

Eld  (0 

,0)]  - E [d  (dy.dx)] 

2 1 

A 

A 

cr 

i 

n 

* 

N 

r 

1 

/ 

B ■ — j 

E [n  (0,0, i)  - n (0 

2 

i-1  ii 

L ' r d 

cr 

-E| [n  (0,0, i)  - m (dy,dx)  - n (dy,dx.i)] 
r i d 


N 


a in  (dy , dx) 
2 i=l  ii  i 

cr 

n 


(4.4) 


Since  K[Y(dy,dx)]  is  negative,  the  lover  bound  on  P (dy,dx)  represents 

c 

the  case  in  which  var  [Y  (dy, dx)]  takes  on  its  maximum  value. 


var  [Y(dy , dx)]  < 


2 2 

vnr[d  (0,0)]  +var[d  (dy.dx)] 
A A 


cr 

n 


(4.5) 


2 V-1  2 

var[d  (0,0)]  - var  a lc(0,0,i) 

A U-l  ii 


N 


2 

var[lc(0,0,i)  ] 


i-1  ii 


N 

4T  2 

■2  O'  a (4.6) 

n i-1  ii 


2 

>»r[d  (dy.dx)] 
A 


N 

f*  2 

var[^a  Ic(dy,dx,i)  ] 
i-1  ii 


N 


2 

var [Ic(dy,dx,i)  ] 


i-1  ii 


N 

2P  2 2 

- 2 O'  a [2m  (dy,dx)  + (J  ] 
n i-1  ii  i n 


(4.7) 


Substituting  (4.6)  and  (4.7)  into  (4.5)  yields 


N 

4 V"*  2 2 

wir (Y(dy,dx)]  < a [m  (dy,dx)  + CT  ] (4.8) 

2 i-1  ii  i n 

0* 
n 


Again  substituting  (4.4)  and  (4.8)  into  (4.3),  we  get  the  desired 

expression  for  the  lower  bound  of  P (dy,dx). 

c 
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1 r-  2 

Z_,  a m (dy.dx) 

cr  i-1  ii  i 


P (dy,dx)  ^ erf 


(4*9) 


y 2 2 2 

2-  a [m  (dy,dx)  4-  <T  ] 
/ i=l  ii  i n 


Since  we  are  interested  in  finding  the  matrix  coefficients  a which 

ii 


maximize  P (dy,dx),  we  observe  that  erf  (.)  is  a monotonic  increasing 


function,  and  it  is  only  necessary  to  maximize  the  argument  in  order  to 
maximize  the  value  of  the  function.  Let 


L.  a m (dy , dx) 
i =1  ii  i 


Y 2 2 2 

2 CT  I £_,a  m (dy,dx)  + G~ 
nV  1=1  ii  i n 


(4.10) 


We  take  the  partial  derivative  of  (dy.dx)  with  respect  to  each  non- 

N 


zero  element  of  A,  and  set  the  resulting  expressions  to  zero. 


9 


4 


for  each  k,  k=l,2,...,N.  We  now  have  N simultaneous  equations,  all  of 
wtiich  have  the  same  term  on  the  right-hand  side.  Since  the  term  on  the 
right-hand  side  of  ( A • 12 ) does  not  depend  on  k,  it  is  an  arbitrary 
constant.  We  set  this  constant  to  one, 


N 


I 

-i  =i 


2 2 

a [m  (dy,dx)  + 
ii  i 


2 

cn 


N 


2 

m (dy.dx) 


i=l  ii  i 


1 


(4.13) 


then 


2 

m (dy,dx) 
k 

a (dy,dx)  = 

kk  2 2 

m (dy,dx)  + O' 
k n 


2 

X (dy.dx) 
k 

(4-14) 

2 

1 + X (dy.dx) 
k 


Taking  the  second  derivative  of  (4.10) 
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‘ — in  ii 


•s 


2 2 

m + cr 
k n 


[i“l  ii  i 


+ 


2 

a ra 
kk  k 


Y 2 

{__a  ra 

i=l  ii  i 


- 1 } < 0 


(A. 16) 


Thus  (4-14)  maximizes  g,  . 

N 

For  any  practical  application  of  the  weights  derived  here, 

their  dependence  on  (dy,dx)  must  be  removed.  For  any  trial 

registration  attempted,  the  actual  translation  relative  to  correct 

registration  is  unknown,  and  thus,  t ie  appropriate  set  of  weights  is 

unknown.  Several  approaches  are  possible. 

If  the  sensor  response  time  is  long  compared  to  y , the 

x 

reciprocal  of  the  sampling  rate,  then  it  seems  reasonable  to  assume 

2 2 
that  ra  (0,  k)  will  be  consistently  smaller  than  ra  ( 1,0)  for  small 

values  of  k (Figure  8 shows  evidence  of  this  fact).  Under  this 

condition  it  might  be  desirable  to  use  the  weights  associated  with 

minimizing  the  probability  of  error  in  a direction  parallel  to  the  scan 

direction. 

If  the  sensor  resolution  is  the  same  in  both  the  horizontal  and 
vertical  directions  it  is  possible  to  argue  that  the  direction  of  the 
gradient  vector  is  uniformly  distributed  on  [0,21T).  Assume  that 
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m (dy,dx)  - Vlc(0,0,k)’l  /dy  + dx 
^ dy,d  xj 


^IcCO.O.k) 


e from  1 to  the  gradient  vector.  Now  we  take  the 


expected  value  of  a 


If  a probability  distribution  for  r was  known,  then  the 

expected  value  of  a could  be  found.  In  the  absence  of  a model  for  r, 
kk 

we  note  that  the  most  likely  errors  are  those  associated  with  small 
values  of  r (zero  or  one  if  the  tracker  is  working  well)  so  that  it 
would  seem  prudent  to  select  a to  avoid  the  mest  likely  errors.  With 


a 

kk 


1 


(4.21) 


The  benefits  of  using  nonuniform  weights  when  computing  the 
distance  function  depend  on  the  presence  of  both  relatively  good  and 
relatively  bad  points  in  the  teference  set.  If  all  of  the  pixels  in 
the  reference  set  are  of  the  same  "quality",  then  they  will  have  the 
same  weights,  which  is  equivalent  to  having  uniform  weights. 

Figure  15  and  Figure  16  illustrate  the  improvement  in  the 
normalized  cross-distance  function  that  is  obtained  by  the  use  of  the 
nonuniform  weights  as  specified  by  (4.21). 

The  reference  set  was  the  32  by  32  block  of  adjacent  pixels 
with  upper  right  corner  located  at  row  43  and  column  43  of  the 
reference  image  shown  in  Figure  63.  Figure  15  and  Figure  16  are 
normalized  so  that  the  distance  function  value  corresponding  to  correct 
registration  is  1.00. 

It  is  of  questionable  value  to  utilize  both  nonuniform  weights 
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REFERENCE  SET 


32x32  BLOCK 


"in 


-6  -5  -4  -3  -2  -1  0 1 2 3 4 5 6 

dy  (dx  - 0) 

Figure  16.  Normalized  Cross-Distance  Function  Using  Optimized  Wcigiits 


and  an  adaptive  reference  set  selection  process  since  the  reference  set 
selection  process  will  presumably  incorporate  into  the  reference  set 
only  those  pixels  which  would  be  heavily  weighted  anyway.  Another 
caveat  that  must  be  placed  on  the  use  of  nonuniform  weights  is  that  the 
assumptions  which  were  made  to  eliminate  the  dependence  of  the  weights 
on  the  translation  may  not  be  appropriate  for  all  sensors  and  classes 
of  imagery,  and  may,  in  fact,  be  invalidated  by  an  adaptive  reference- 
selection  algorithm. 


4.2  Nonlinear  Peak  Elimination  Filter 


In  addition  to  frequency-response-shaping  filters  employed 
ahead  of  the  sampling  and  digitization  steps  of  the  tracker,  the 
opportunity  exists  to  filter  the  sensor  output  in  the  discrete  domain 
prior  to  performing  the  similarity  detection  operation.  The  following 
adhoc  nonlinear  filter  is  an  example  of  an  easily  mechanized  algorithm 
vrtiich  shows  a potential  for  reducing  the  random  noise  component  in 
sampled  imagery. 

The  algorithm  tc  be  used  is  as  follows: 

1)  Examine  each  pixel  in  an  image  sequentially  by 
rows  starting  with  the  upper  left  corner. 

2)  If  the  pixel  under  examination  does  not  have  a 
neighbor  above,  below,  on  the  left  and  on  the  right,  go 
on  to  the  next  pixel  (i.e.  if  it  lies  on  an  edge,  don't 
process  it). 

3)  If  the  pixel  under  examination  has  a value 
greater  than  the  maximum  value  of  its  four  nearest 
neighbors,  replace  it  with  the  maximum  of  the  four 
neighbors . 

4)  If  the  pixel  under  examination  has  a value 
smaller  than  the  minimum  value  of  its  four  nearest 
neighbors,  replace  it  with  the  minimum  of  the  four 
nearest  neighbors. 
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In  order  to  understand  the  motivation  behind  this  operation, 
just  look  around  and  try  to  find  a spot  that  is  at  the  same  time  either 
brighter  (an  intensity  peak)  or  darker  (an  intensity  pit)  than  its 
neighbors  and  so  snail  that  it  does  not  have  a detectable  shape.  This 
spot,  if  you  can  find  one,  is  the  visual  analog  of  a single  resolution 
element  in  a digitized  image  which  satisfies  the  requirement  of  being 
detectably  brighter  or  darker  than  its  surroundings.  The  general 
difficulty  of  finding  such  points  leads  to  the  following  questions: 

1)  How  often  do  single  pixel  peaks  and  pits  occur 
in  an  image  made  from  pure  noise? 

2)  Is  there  any  benefit  to  removing  single  pixel 
peaks  and  pits  from  images  which  contain  both  signals 
and  noise,  and  if  so,  how  big  is  the  benefit,  and  how 
can  it  be  characterized? 

To  answer  these  questions,  we  will  first  show  that  for 

independent,  identically  distributed  random  variables  arranged  and 

labeled  as  in  Figure  17,  the  probability  that  x is  either  the  largest 

1 

or  the  smallest  of  the  set  of  five  is  .4,  regardless  of  the  form  of  the 
probability  distribution. 

Let  F be  the  cumulative  distribution  function  associated  with 
x 

i 

x , let  f be  the  corresponding  probability  density  function,  and 
i x 

1 

assume  that  all  of  the  density  functions  are  continuous.  We  wish  to 

establish  the  probability  that  x is  a local  extreme  point,  i.e. 

1 


& 


I 

! I 


Figure  17.  The  Four  Nearest  Neighbors  of 


(4.35) 


P(  x is  an  extreme  point]  *•  — 
1 5 


independent  of  the  distribution  of  the  x . 

i 

How  does  this  40%  figure  for  a pure  noise  image  compare  with 
natural  imagery  after  it  has  been  sensed,  sampled,  and  digitized? 

Figure  56,  Figure  57,  and  Figure  58  show  samples  from  the  image 
sequences  CARS,  TREES,  and  AIRPLANE. 

Table  III  lists  the  percentage  of  pixels  in  each  of  four 
images  that  are  local  extreme,  and  Table  IV  illustrates  the  change  in 
the  percentage  of  local  extreme  points  as  noise  of  increasing  variance 
is  added  to  frame  1 from  image  sequence  AIRPLANE.  As  the  noise 
variance  is  increased,  the  fraction  of  pixels  that  are  local  peaks  or 
pits  approaches  the  limit  of  .4  predicted  by  theory. 

Figure  18  illustrates  the  noise  distribution  before  and  after 
application  of  the  filter  for  an  image  containing  pure  random  noise. 

Based  on  these  results,  the  non-linear  peak  elimination  filter 
shows  considerable  promise  for  reducing  the  noise  component  of  imagery 
in  regions  of  low  contrast  (regions  with  a high  degree  of  local 
randomness)  while  leaving  relatively  unchanged  the  signal  component 
(persistent  local  intensity  gradients)  of  the  same  imagery.  The 
performance  improvement  of  a tracking  system  using  the  peak  elimination 
prefilter  will  be  documented  in  Chapter  6. 


Table  III.  Extreme  point  statistics  for  real  data  images 


Image 

% of  total  pixels  that  are 
peaks  or  pits 

Frame  1 of  CARS 

15.5 

Frame  1 of  TREES 

20.7 

Frame  1 of  AIRPLANE 

24.0 

Pure  noise 

39.4 

Table  IV.  Effects  of  additive 

noise  on  extreme  point  statistics 

Variance  of  additive  noise 

Z of  total  pixels  that  are 

peaks  or  pits 

0.00  (original  image) 

24.0 

.04 

26.6 

.25 

30.2 

1.00 

34.4 

4.00 

36.6 

16.00 

38.9 

25.00 

38.8 

100.00 

39.5 
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INITIAL  DISTRIBUTION 
OF  PIXEL  INTENSITIES 
FOR  A PURE  NOISE  IMACE 


VARIANCE 


INTENSITY 


DISTR’BUTION  OF  PIXEL 
INTENSITIES  AFTER 
SMOOTHING 


VARIANCE  » .48 


INTENSITY 


Noise  Distribution  Before  nnd  After  Smootliing 
with  the  Non-Linear  Peak  Elimination  Filter 


4.3  Adaptive  Reference  Set  Selection 


In  Section  3.12  we  saw  that  it  is  advantageous  (with  respect  to 


minimizing  the  probability  of  error)  to  maximize  m (dy,dx)  for  any 

N.AVE 

particular  N.  In  this  section,  we  will  develop  a technique  for 


adaptively  selecting  the  reference  set  which  maximizes  m 

N.AVE 

Consider  an  arbitrary  pixel  lc(0,0,k)  in  the  comparison  set. 
The  facet  model  assumes  that  the  gradient  is  approximately  constant  in 
a region  surrounding  any  point.  The  contribution  of  ^7Ic(0,0,k)  to 


m (dy,dx)  is 
N.AVE 


2 „ ^ 2 

m (dy,  dx)  = [ VIc(0,0,V.)*  (dy  1 + dx  1 )] 

k y x 


2 2 2 

| Vic  (0 ,0,1c ) | r cos  Y 


(4.36) 


where  y is  the  angle  between  (dyl  + dxl  ) and  Vlc(0,0,k),  and 

y x 

2 2 2 2 

r = dy  +dx  . We  see  that  m (dy,dx)  is  proportional  to  the  gradient 

k 


magnitude  at  lc(0,0,k).  To  maximize  m , select  the  N pixels  in  the 

N,AVE 

reference  i.age  which  have  the  largest  gradient  magnitudes.  This 

selection  can  be  performed  in  three  steps: 

1)  Calculate  the  gradient  magnitude  for  each  pixel 
in  the  reference  inage  and  form  a histogram  of  gradient 
magnitudes. 
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2)  Starting  with  the  largest  gradient  magnitude 
and  working  down,  find  the  largest  threshold  value  such 
that  there  are  at  least  N pixels  with  gradient 
magnitudes  greater  than  or  equal  to  the  threshold. 

3)  Find  N pixels  in  the  reference  image  which  have 
a gradient  magnitude  greater  than  or  equal  to  the 
threshold  . 

A hazard  exists  with  this  approach  to  selecting  the  reference  set.  If 
the  resolution  of  the  sensor  in  the  scan  direction  is  substantially 
less  than  the  resolution  perpendicular  to  the  scan  direction,  it  is 
possible  for  this  algorithm  to  select  N pixels  with  all  gradient 
vectors  perpendicular  to  the  scan  direction.  The  result  is  a very 
2 

small  value  for  m (0,dx)  and  an  increased  probability  of  errors 
N,A  VE 


parallel  to  the  scan  direction.  The  indication  is  that  the  angular 
resolution  of  the  sensor  after  digitization  should  be  approximately  the 
same  in  each  axis.  This  problem  can  be  lessened  somewhat  by  weighting 
the  gradient  component  parallel  to  the  scan  direction  more  heavily  than 
the  component  perpendicular  to  the  scan  direction  when  computing  the 
gradient  magnitude,  or  by  using  only  the  component  parallel  to  the  scan 
direction  to  form  the  histogram.  For  the  three  data  sequences  used  for 
evaluation  of  tracking  algorithms,  neither  of  these  strategies  was 
required . 

Since  the  gradient  magnitudes  are  used  in  decreasing  order  to 
assemble  the  reference  set,  the  possibility  exists  that  the  probability 
of  error  is  not  a monotone  decreasing  function  of  N.  What  conditions 
would  have  to  exist  in  order  for  this  to  occur,  and  what  precautions 
should  be  taken  to  prevent  it  ? 

If  is  not  a monotonic  decreasing  function  of  N,  then  there 


exists  some  k such  that 
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U 


(4.37) 


P [dy, dx  | N“k+1 ] > P [dy,dx  | N-k] 

c 


2 

For  example,  suppose  that  f = .160  when  the  50  pixels  with 

50.AVE 

largest  gradient  magnitude  are  included  in  the  reference  set,  and  in 
addition,  suppose  that  the  next  200  pixels  in  the  gradient  magnitude 
histogram  had  sigpal-to-noise  ratios  of  .07.  From  (3.57)  the  maximum 

2 * 

probability  of  error  for  N **  50  and  V “ .160  is  1 - erf  [.52523]. 

N,AVE 

If  the  next  pixel  to  be  incorporated  into  the  reference  set  has 
2 

Y = .07,  then 
51 


2 .160  x 50  + .07 

y « = .158 

51.AVE  51 


2 

and  the  maximum  probability  of  error  for  N « 51  and  Y “ .158  is 

N.AVE 

* 

1 - erf  [.52500].  This  increase  in  the  probability  of  error  indicates 

2 

that  the  pixel  with  Y ■=  .07  should  not  be  incorporated  into  the 

2 

reference  set.  However,  if  all  200  pixels  with  Y ■*  .07  are  included 
in  the  reference  set,  then 

2 .160  x 50  + .07  x 200 

Y - .088 

250, AVE  250 
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and  the  probability  of  error  is  1 - erf  [.66697].  From  this  we  see 

that  there  are  cases  where  the  inclusion  of  additional  pixels  in  the 

reference  set  does  not  automatically  decrease  the  probability  of  error. 

2 2 

We  will  next  establish  a lower  bound  for  Y as  a function  of  Y 

N+l  N,AVE 

2 

and  N such  that  P will  always  decrease  so  long  as  Y is  greater 
^ N+l 


than  the  bound. 
Let 


2 

in 

2 i 

Y = (4.38) 

i 2 

cr 

n 


1 


N 

2 ir  2 

y —L  r 

N,AVE  N i=l  i 


(4.39) 


£ 

N 


2f~* 


2 

r 
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(4.40) 


v^iere  we  drop  the  specification  of  (dy,dx)  to  simplify  the  notation. 

Recall  that  £ is  the  argument  of  erf*(.)  which  establishes  the  upper 
N 
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bound  on  the  probability  of  error.  From  (3.57)  we  see  that  increasing 


£ decreases  the  upper  bound  on  the  probability  of  error 
N 


£ < £ =r>  P [dy.dx  | N“k]  > P [dy.dx  | N=K+1] 

6 c» 


N N+l 


(4.41) 


We  are  looking  for  a lower  bound  on  X which  will  cause  £ to  be 

N+l  N+l 


greater  than  £ 


f*  r 
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f™'  r 
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1 + X 

N.AVE 


/ 


(4.42) 


2 

i + r 
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2 2 

substituting  (4.39)  for  Y"  and  Y*  and  collecting  terms  in 

N.AVE  N+l.AVE 

powers  of  X 

N+l 


4 2 2 2 2 

X [1  + X]  + X N[(  X > 
N+l  N N+l  N.AVE 


2 2 2 

+ 2 X J - N(  X ) > 0 
N.AVE  N.AVE 


(4.43) 


and  solving  for  the  lower  bound  on  X 

N+l 


For  N greater  than  10,  this  lower  bound  for 


is  relatively 


insensitive  to  N.  Figure  19  shows  the  behavior  of  the  ratio  of  the 


lower  bound  for 


the.  signal 


to-noise  ratio  for  the  next  pixel  divided  by  the  current  average 


signal-to-noise  ratio  must  lie  above  the  curve  to  assure  that  the  upper 


bound  on  probability  of  error  is  a monotonic  decreasing  function  of  N 


There  seems  to  be  only  one  realistic  situation  where  it  i 


likely  that  the  upper  bound  on  probability  of  error  is  not  a monotonic 


decreasing  function  of  N . This  case  will  occur  when  the  referenc 


image  contains  a small  nunfcer  of  very  high  contrast  pixels  on  a low 


contrast  background.  The  histogram  of  gradient  magnitudes  will  contain 


a few  points  in  the  high  value  bins  with  a large  span  of  vacant  bin 


separating  these  from  the  remainder  of  the  image.  This  case  did  not 


arise  in  any  of  the  imagery  used  for  the  evaluation  of  this  reference 


set  selection  algorithm 


There  is  an  additional  restriction  on  the  inclusion  of  a 


particular  pixel  in  the  reference  set.  For  integer-valued  imagery,  a 
pixel  should  be  considered  for  inclusion  in  the  reference  set  only  if 
the  gradient  magnitude  in  the  region  surrounding  the  pixel  in  question 
is  sufficiently  large  to  ensure  that  there  will  be  a contribution  to 
the  distance  function  for  some  trial  registration. 

To  see  this,  consider  a one-dimensional  example.  Figure  20 
illustrates  a hypothetical  one-dimensional  intensity  profile  for  a 
digitized  image. 

The  pixel  labeled  X can  be  shifted  right  or  left  by  as  much  as 
four  pixels  without  contributing  anything  to  the  distance  function.  If 
the  search  region  is  plus  and  minus  two  pixels,  the  inclusion  of  X in 
the  reference  set  contributes  nothing  to  discovering  any 
misregistration  between  the  reference  image  and  the  data  image,  even  if 
the  actual  misregistration  is  the  maximum  allowable  value  (two  pixels). 
The  local  average  derivative  in  the  region  surrounding  X must  be 
1 

greater  than  to  contribute  to  the  distance  function  at  any 

4R  + 1 
MAX 

allowable  misregistration,  and  a more  practical  limit  would  require  a 
local  average  derivative  sufficiently  large  to  produce  a distance 
function  contribution  at  the  edge  of  the  search  region  even  when  the 
data  image  and  the  reference  image  are  perfectly  registered.  For  this 
reason,  we  place  the  following  constraint  on  the  pixels  to  be  included 
in  the  reference  set: 
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I VIr  (i,  j)  | > 


1 


(4.45) 


R 

MAX 


where 


R 

MAX 


(4.46) 


For  Ir(i,j)  less  than  this  limit,  and  a perfectly  registered  data 
image  there  is  no  trial  registration  in  a noise-free  image  which  has 
any  error  signal  attributable  to  the  inclusion  of  (i,j)  in  L. 

There  are  two  types  of  regions  which  will  exhibit  the  desired 
characteristic: 

1)  Local  extreme  points  (either  peaks  or  pits) 
will  exhibit  a change  in  intensity  in  every  direction. 

2)  Points  which  lie  on  edges  will  show  a change  in 
intensity  in  directions  perpendicular  to  the  edge. 

Since  any  significant  peak  or  pit  will  be  surrounded  by  an 

edge,  the  approach  to  be  taken  will  be  to  estimate  the  gradient 

magnitude  at  each  point  in  the  image  and  use  the  N points  with  the 

largest  gradient  magnitude  values  for  the  reference  set. 

In  the  remainder  of  this  chapter  we  will  develop  a gradient 

magnitude  estimation  technique  for  selecting  the  reference  set  and 

2 

demonstrate  the  potential  for  increasing  X by  adaptive  reference 

N,AVE 


set  selection. 
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4.3.1 


Gradient  Magnitude  Estimation 


The  approach  to  estimating  the  gradient  magnitude  at  a point  in  the 
reference  image  will  be  to  estimate  the  horizont.il  and  vertical 
components  separately  and  combine  them  according  to 


x 


(4.47) 


The  error  to  avoid  is  that  of  incorrectly  estimating  that 


I VIr (i,  j)  | > (4.48) 

R 

MAX 


and  hence  making  Ir(i,j)  a candidate  for  inclusion  in  the  comparison 
set  when  it  will  only  contribute  to  the  noise  and  never  contribute  to 
the  signal. 

A rectangular  search  area  will  be  assumed,  with 

2 2 2 

R - R + R (4.49) 

x y 


where 


R - rj  dx  (4.50) 

x 'x  MAX 


R - Tj  dy  (4.51) 

y 1 y MAX 
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and  V)  and  arc  the  scale  factors  which  convert  pixel  spacing  in  x 
* x y 


and  y to  horizontal  and  vertical  angular  displacements. 

The  following  assumptions  are  made  about  the  characteristics  of 
the  image: 


1)  The  distance  over  which  a gradient  component 
persists  is  roughly  proportional  to  the  inverse  of  its 
magnitude  (i.e.  low  gradients  persist  for  longer 
distances  than  do  higher  gradients).  If  this  was  not 
true,  then  a histogram  of  gradient  magnitudes  would 
cluster  away  from  the  origin.  As  we  shall  see,  this  is 
not  the  case. 

2)  The  occurrence  of  single  pixel  extreme  points 
that  are  not  due  to  noise  phenomena  is  relatively  rare 
in  a randomly  selected  image.  Table  3 indicates  that 
this  is  reasonable. 

3)  The  covariance  of  the  noise  in  the  image  can  be 
modeled  as  a zero-mean,  first-order  Markov  process  in 
each  axis  (see  Section  3.8). 

In  order  to  bound  the  rate  at  which  low  gradient  points  are 
erroneously  determined  to  have  sufficient  signal  strength  to  be 
included  in  the  reference  set,  a restriction  is  placed  on  the  allowable 
performance  of  the  estimator.  For  each  component  of  the  gradient,  the 
following  criterion  must  be  met: 


’U 

R 

i 


■>  constant 


cr,  A 

Vlr  • 1 


C 


(4..r>2) 


where 


A A A 

VIr*l  is  the  estimated  gradient  component  in  the  1 direction, 
i i 


A 

R is  the  search  radius  in  the  1 direction,  and  0~  is  the 

A A 

i i VIr  • 1 

i 

A 

standard  deviation  of  the  estimate  of  VIr*l  . This  restriction  on  the 

i 

estimator  will  ensure  that  there  are  at  least  C standard  deviations 

between  the  estimate  of  the  gradient  component  magnitude  and  the  1 /R 

i 

point  (see  Figure  21).  The  effect  is  to  build  an  estimator  which  has  a 
variable  confidence  interval,  but  operates  with  a fixed  maximum 
probability  of  ern  r. 

With  this  approach  to  estimating  the  gradient  magnitude 
components,  the  probability  of  erroneously  including  a point  in  the 
reference  set  can  be  approximated  as 


* 2 

P [ERROMEOUS  INCLUSION)  = [erf  (-C)]  (A. 53) 


where  we  assume  that  errors  in  the  x-component  and  y-component 
estimates  are  independent. 

This  probability  can  be  made  arbitrarily  small  by  increasing  C 
at  the  cost  of  reducing  the  number  of  pixel  locations  that  are 
candidates  for  inclusion  in  L. 

In  estimating  the  gradient  components,  each  component  is 
modeled  as  a constant  in  the  region  used  to  form  the  estimate.  For  the 
purpose  of  developing  the  necessary  equations,  only  the  x-component 
will  be  dealt  with.  The  y-component  differs  only  in  the  correlation 
time  used  in  the  Markov  model  for  the  noise.  Let 
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\ 


Sx  (i, j) 
K 


Ir (i,  j+n)  - I r (i. , j-n) 


(4.54) 


E 

n = l 


With  the  approximation  that 


Ir(i,  j+n)  = I r (i,  j)  + n rn  VIr(i,j)-l 


7 ' 

1 X 


(4.55) 


Sx  (i,j)  becomes 
k 


Sx  (i,j) 
K 


-E 


2n  T\  VIr(i,j)*l 
n=l  (x  : 


N 

=2  VI  r (i , j ) *T  T) 

x x i=l 


: (K+l ) ^ 


VIr  (i,  j)*l 


(4.56) 


From  4.58  we  define  the  estimate  for  VIr(i,j)*l  using  2k  adjacent 

x 

pixels  to  be 


VIr(i,  jW 


Sx  (i,j) 
K 

K(K+1  )^ 


(4.57) 


The  required  constraint  reduces  to 
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VIr(i,jW  >. 


>• * (T  + — 

x 71  VIr  • 1 R 

1 X XX 


(4.58) 


or  after  substituting  (4.57)  for  VIr(i,j)*l 


Sx  (i,  j)  I ^ K (K+l  ) C (T„  +■ 


(4-59) 


VIr  • 1 R 


Determination  of  the  variance  of  the  estimate  requires  a 

knowledge  of  the  covariance  of  the  noise  associated  with  the  2k 

adjacent  pixels  used  in  forming  the  estimate.  Let  zx  (i,j)  be  the 

K 

vector  made  up  of  the  set  of  2k+l  pixels  used  to  form  the  estimate, 


Ux  (i,  j)  ] - 1 1 r ( i , j -I< ) , I r (i,  j-K+1  ),...,Ir(i,j+K)]  (4.60) 

K 


T 

E (zx  (i.j)-E(zx  (i,j)]l  (zx  (i,j)-E[zx  (i,j)J  1 =Px(i,j) 

1 K K ' K K ' J K 


(4.61) 


Since  the  noise  is  modeled  as  a first-order  Markov  process  with 
zero  mean,  the  elements  of  the  covariance  matrix  are 


Px  (i,j)  - IT  exp(  - JJL  m |i-j|  ) 
K n ' x lx 


(4-62 ) 


where  0"  is  the  variance  of  the  error  in  Ir,  and  U T)  is  the 
n ' x (x 

coi  relation  time  for  the  noise*  Let  C^J  be  a 2k+l  eleinent  constant 

K 

vector 


-1  for  1 < i < K 


UJ  (1)  “ 0 for  i - K 

K 


(4*63) 


1 for  K < i < 2K 


Sx  (i,  j)  can  now  be  expressed  in  a more  compact  notation 
K 


Sx  (i, j)  " CJ  zx  (i, j) 
K K K 


(4.64) 


An  expression  for  the  variance  of  the  estimate  is  now  obtained  in  a 
straightforward  manner 


r 


! 


var  [ VIr  (i,  j)  *1  ] - E [Sx  (i,j)  ] 

x 2 2 K 

K (K+l ) 


IT  T 

■ -E[W  zx  (i,j)  zx  (i, j)  6J  ] 

2 2 K K K K 

K (K+l) 


1 T 

CJ  Px  CJ  (4-65) 

2 2 K K K 

k (K+l) 


The  estimator  now  takes  the  form 


Vlr(l,j)*l 


x 


T 

CO  zx  (i,j) 
K K 


K (K+l) 


(4.66) 


subject  to  the  constraint  that  K'  is  the  smallest  K such  that 


1 


K(K+1 ) ri 

7 r i 

— 

T 

' x T 

2 

U)  zx  (i, j) 

> 

— +C  (J  Px  (J 

(4.67) 

K K 

R 

K K K 

x 


The  gradient  magnitude  components  are  individually  estimated  and  then 
combined  to  form  an  estimate  of  the  overall  gradient  magnitude. 
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1 


A 

A A 

A A 

Vlr(i,  j) 

" 

Vlr(i,j).l 

x_ 

+ 

Vlr(i,j).l 

y. 

(4.68) 


If  the  noise  in  the  reference  image  is  uncorrelated,  then  Px 

K 

2 

is  diagonal  with  all  nonzero  entries  equal  to  0"  , the  global 

n 

reference-image  error  variance.  This  results  in  simplification  of 
(4.65)  to 


2 

2 <r 

1 T n 

6J  Px  U)  = (4.69) 

2 2 K K K 2 

K (K+l)  K (K+l ) 

There  is  an  upper  limit  on  K which  is  determined  by  the  minimum 
distance  between  the  pixel  at  which  the  gradient  magnitude  is  being 
estimated  and  the  nearest  image  boundary.  A smaller  upper  limit  may  be 
desirable  in  practice  due  to  considerations  of  computation  time  or 
hardware  complexity.  In  either  case,  if  the  upper  limit  of  K is 
reached  without  satisfying  the  constraint,  the  gradient  magnitude 
component  can  simply  be  estimated  to  be  zero.  This  precludes  the 
possibility  of  an  increasing  classification  error  rate  near  the 
boundary  of  the  image.  Figure  22  shows  the  minimum  detectable 

A 

gradient-magnitude  component  as  a function  of  VIr*l  and  C 0~ 

i n.REF 

We  now  have  a gradient  magnitude  estimation  algorithm  with  the 
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desired  characteristics.  In  the  next  section  we  will  investigate  the 
performance  of  this  estimator,  the  distribution  of  gradient  magnitudes 


in  some  actual  images,  and  the  improvement  in  T that  can  be 

N,AVE 

obtained  by  using  the  gradient  magnitude  histogram  to  select  a 
reference  set. 


4.3.2  Experimental  Reference  Selection  Using  Gradient 


Magnitudes 


Figure  23  through  Figure  25  shows  the  distribution  of  detected 


gradient  magnitudes  for  the  three  different  images  with  selected  values 
of  C and  estimated  values  of  (T  • 


Figure  26  through  Figure  31  display  the  gradient  magnitudes  as 


detected . 

In  these  images,  large  detected  values  of  gradient  magnitude  show  up  as 
bright  pixels.  From  this  it  is  clear  that  in  all  three  images  there 
are  a few  pixels  with  very  high  gradient  magnitudes  and  a large, 
majority  with  relatively  small  values.  The  result  is  that  the  value  of 


m (dy,dx)  decreases  rapidly  as  N increases.  Figure  32,  Figure  33, 
N.AVE 


and  Figure  34  locate  the  pixels  incorporated  into  the  reference  set  for 
N - 128. 

Figure  35  shows  the  auto-distance  function  for  a 32  by  32  block 
of  adjacent  pixels  from  the  reference  image  shown  in  Figure  63. 

The  upper  right  corner  of  the  reference  set  was  located  at  row 
43,  column  43.  When  the  gradient  magnitude  detection  algorithm  was 
used  to  select  the  reference  set,  a very  significant  increase  in 
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Figure  ?6.  Gradient  magnitude  image  of  CARS  for  C - I and  C 


-vivr. 


Figure  28.  Gradient  magnitude  image  of  TRF.FS  for  C = 1 and  C 


Figure  32.  Reference  set  for  CARS  with  N = 128 


1 


j 


average  signal  strength  was  realized.  Figure  36  Illustrates  the 
iinpr ovciaent  that  was  realized  when  the  1024  pixels  with  maximum 
gradient  magnitudes  were  used  to  form  the  reference  set. 

When  N was  reduced  from  1024  to  128,  a further  increase  in  average 
signal  strength  was  realized  (sec  Figure  37). 

In  any  tracking  system  which  uses  a single  processing  element 

2 

to  perform  all  of  the  computations  associated  with  evaluating  d 
there  will  be  a trade-off  to  be  made  between  the  number  of  trial 
registrations  that  can  be  attempted  and  the  number  of  pixels  to  be 
carried  in  the  comparison  set.  If  the  interframe  time  and  the  target 
dynamics  relative  to  the  sensor  optical  axis  are  known,  a maximum 
search  area  size  can  be  determined.  Search  area  size,  in  combination 
with  pixel  spacing  and  processor  speed,  leads  directly  to  an  upper 
bound  on  values  for  N.  At  this  point,  specification  of  a maximum 
allowable  probability  of  error  will  determine  the  minimum  acceptable 
signal-to-noise  ratio.  If  the  combination  of  scene  and  sensor  cannot 
provide  the  required  signal-to-noise  ratio,  a faster  processor  or  a 
smaller  search  area  is  indicated. 

4.4  Summary 

In  Chapter  4 we  have  assumed  that  the  reference  image  was 
given,  and  dealt  with  three  techniques  to  increase  the  effective 
tracking  signal-to-noiao  ratio:  first,  by  using  non-uniform  weights  for 
the  norm;  second,  by  reducing  the  random  noi3e  component  of  raw  data 
imagery  in  regions  of  low  contrast;  and,  third,  by  selecting  the 
reference  set  in  a way  which  increases  its  average  signal  strength.  In 
the  next  chapter  we  will  consider  how  to  get  a good  reference  image. 
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Chapter  5 

REFERENCE  IMAGE  ESTIMATION 


In  this  chapter,  we  develop  an  adaptive  Kalman  filter  to 

perform  the  r ef erencc-image-update  task,  prove  that  it  is  stable,  and 

demonstrate  the  performance  of  the  filter. 

In  Section  3.10  we  investigated  the  sensitivity  of  the 

probability  of  error  to  both  the  signal  and  noise  components  of  the 

image.  While  the  m (dy,dx)  are  a function  of  the  sensor  and  the  scene, 
i 

the  noise  is  a function  of  the  sensor  and  the  processing  that  is 
performed  on  the  received  images.  If  the  s lgnal-ro-noise  ratio  at  the 
input  to  the  similarity  detection  process  is  maximized,  then  the 
performance  of  the  system,  as  measured  by  the  probability  of  error,  is 

2 

dependent  on  the  ref erenee-image-update  process  to  minimize  (F 

n ,REF 

There  are  also  benefits  to  the  reference-set  selection  process  when 

2 

0”  is  reduced  since  the  gradient  estimator  performance  is  also 

n.REF 

dependent  on  this  noise  variable. 

5.1  Adaptive  Kalman  Filter 

The  classic  formulation  of  the  Kalman  filter  assumes  a complete 
a priori  knowledge  of  the  process  and  the  measurement  noise  statistics. 
In  most  practical  applications  these  statistics  are  inexactly  known. 

The  use  of  incorrect  a priori  statistics  can  result  in  a Kalman  filter 


which  has  large  estimation  errors  or  which  may  even  bo  divergent.  The 


purpose  of  an  adaptive  filter  is  to  reduce  these  errors  by  modifying 
the  filter  to  adapt  it  to  the  real  data. 

At  this  point  we  digress  for  a moment  to  review  the  general 
Kalman  filtering  problem,  we  will  then  establish  the  the  equivalence 
between  the  conventional  Kalman  filter  notation  and  its  specific 
application  to  the  sequential  image  tracking  problem  and  develop  the 
estimation  procedure  to  be  used  to  obtain  the  process  and  measurement 
noise  statistics.  For  a more  detailed  review  of  Kalman  filter  theory 
see  Gelb  [12].  The  particular  approach  to  be  followed  in  developing 
the  adaptive  filter  largely  follows  Mehra  [25]  . Let 

x =»  <£x  + u (5.1) 

i+1  i i 


z ■=  H x + v (5.2) 

i i i 

where  x is  the  state  vector,  (J) is  the  state  transition  matrix,  u is 
i i 

the  process  noise  vector  which  induces  changes  in  x , z is  the 

i i 

measurement  vector,  H is  the  measurement  matrix,  and  v is  the 

i 

measurement  noise  vector.  Both  u and  v are  assumed  to  be  zero-mean, 

i i 

uncorrelated  Gaussian  sequences  with 

E[u  ) = 0 (5.3) 

i 
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E [u  u ] ■ Q 

i j JJ 


(5.4) 


E [v  ) ■=  0 
i 


T 

E [v  v ) = R 


i j 


ij 


5 


(5.5) 


(5.6) 


where  d is  the  Kronecker  delta  function  and  Q and  R are  bounded 

ij 

posi  ive  definite  matrices.  Let  x be  an  estimate  of  x based  on  the 

i/j  i 

observation  set  Z where 

J 


Z “ { z , z i * • . | z ) 
J 1 2 j 


(5.7) 


l < 


Let  P be  the  covariance  of  the  estimation  error  based  on  Z 

j 


= E [x  -x  ] (x  -x  ] 

i/j  i i/j  i i/j 


(5.8) 


When  Q and  R are  known,  the  minimum  variance  linear  estimator  is  given 
by  the  Kalman  filter  of  the  form 


c ■=  (|)  x 

i+1  /i  i/i 


(5-9) 


k 

J 
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it 


t =£  +K[z-Hx  ] 

i/i  i/i-1  i i i/i-1 


(5.10) 


T T -1 

K*=P  H (HP  H+R) 

i 1/i-l  i/i-1 


(5.11) 


P = (I  - K H)  P 
i/i  i i/i-1 


(5.12) 


P =<^)  P (J)  + Q 

i+l/i  i/i 


(5-13) 


where  K is  the  Kalman  gain,  and  -z  -Hx  is  called  the  innovation 
i i i i 


sequence.  For  the  sequential  image  estimation  problem  both  Q and  R 


will  be  estimated  from  measurements  made  during  the  reference  image 


update  process. 


The  following  equivalences  establish  the  relationships  between 


the  notation  of  conventional  Kalman  filter  theory  and  the  particular 


variables  of  the  image  tracking  problem  as  used  in  the  preceding 


chapters.  We  let 


x - Is,  the  underlying  image 


(5.14) 


x * Ir,  the  reference  image 


(5.15) 


-2 
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z = Id,  the?  data  image 


(5.16) 


H “ <f)  = I , the  iden 


tity  matrix 


(5.17) 


2 

CT  I , the  measurement  noise  covariance  matrix 

n.DATA 


(5.18) 


Q = q I , the  process  noise  covariance  matrix 


(5.19) 


m 0~  I » the  estimation  error  covariance  matrix 

i/i-1  n.REF 


(5.20) 


where  all  of  the  identity  matrices  are  of  dimension  M M (the  number  of 

R C 

pixels  in  the  reference  image).  We  employ  a c ova riance— matchi ng 

2 2 

technique  to  determine  appropriate  values  for  O'  and  q . Note 

n.OATA 

that  while  the  underlying  image  is  strictly  positive  (or  zero),  the 
state  model  allows  for  negative  state  vector  components.  To  the  extent 
that  this  does  not  represent  the  true  situation  with  real  images,  the 
filter  may  produce  suboptimnl  results.  The  expected  value  of  the 
innovation  sequence  is  [25] 
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T T 

E [ lJ  ] = E [ (z  - H x ) ( z - H x )] 

i i i i/i-1  i i/i-1 


= P + R 

i/i-1 


2 2 

= ( C + (T  ) I (5.21) 

n,REF  n.DATA 


Any  detected  deviation  above  this  value  is  taken  as  an  indication  that 
the  filter  is  not  optimal  (in  the  sense  of  minimum  variance)  and  that 
2 2 

q and  (T  should  be  adjusted  to  bring  the  filter  back  toward 

n.DATA 

optimal  performance.  The  reference  image  update  filter  will  maintain 

2 2 

estimates  of  both  O'  and  (T  and  use  the  difference  image 

n.DATA  n.REF 

associated  with  the  minimum  distance  registration  as  the  innovation 
sequence . 

Since  the  difference  image  contains  a large  number  of  pixels, 

the  sample  statistics  for  the  difference  image  should  closely 

approximate  the  true  underlying  statistics,  i.e.  the  bias  and  variance 

of  the  sample  statistics  will  be  small.  The  sample  statistic  that  will 

be  used  in  th'>  estimation  of  Q and  R is  the  difference  image  sample 

variance  v . 

i 


1 3d 


v = D (dy,  dx,k,  j) 

i M k j i 
D 


% I D (dy.dx.k.j) 


(5.22) 


where  M is  the  number  of  pixels  in  the  difference  image  associated 
D 

with  the  minimum  distance. 


M = P MIN{M  , M -dy)  - MAX{1,  dy>]  x 
D L R R J 

|MIN{M  , M — dx > - MAX{  1 , dx>] 
L C C J 


(5.23) 


The  mean  and  variance  of  v are 

i 


2 2 

E[v  ] = CT  (i-1)  + O'  (i-1) 
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(5.24) 


var[v  ] 
i 


2 2 

2 [ (T  (i-1)  + O'  (i-1)  ] 

n.REP  n.DATA 

M - 1 
D 


(5-25) 


While  v is  a biased  estimate  (N  should  be  smaller  by  one  to  be 
i D 
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unbiased),  5.21  is  preferred  over  the  unbiased  estimate  since  it  gives 


2 

less  mean-square  error  [15] . We  will  use  v - O’  (i-1)  ns  a measure 

i n ,REF 


a2 

of  change  in  O’  > and  define  a time  constant  (3  (0  < |3  < 1)  for 

n .DATA 


2 

A 

changes  in  CT 

n .DATA 


2 2 2 
or  (i)  = (36-  (i-1)  + (i-(3)  [V  - cr  (i-i)] 

n , DATA  n , DATA  i n.REF 

(5.26) 


2 

The  basis  for  this  restriction  on  the  rate  of  change  of  O’  is  the 

n.DATA 

assumption  that  sensor  noise  variance  is  a function  of  parameters  which 
change  relatively  slowly  compared  to  the  sensor  frame  rate.  In  a 
vidicon  sensor,  it  could  be  the  faceplate  temperature  or  target 
voltage.  In  other  sensor  types,  other  noise  sources  respond  to  the 
environment  with  finite  time  response.  The  remainder  of  the  difference 

2 2 

between  v and  the  filter  estimates  of  C and  O’  , denoted 

i n.PATA  n.REF 

T , is  attributed  to  change  in  the  underlying  image  Is  and  is  assigned 
2 

2 2 

to  q to  increase  the  filter  estimate  of  O’  prior  to  the 

i n.REF 

calculation  of  the  next  Kalman  gain. 


1A0 


r 


2 2 

T = v - O’  (i-1)  - O'  (i-1)  (5.27) 

2 i n.REF  n.DATA 


Since  Q and  R must  be  positive  definite  matrices,  a precautionary 

2 2 

restriction  is  placed  on  q and  on  O'  (see  (5.33)) 

i n.DATA 

2 

q = MAX{  0,  T } (5.28) 

i 2 


To  initialize  the  filter,  we  take  the  first  data  image  as  the 
first  reference  image  since  there  is  no  better  information  available 

2 } 

about  Is,  and  for  O , we  use  O'  , an  a priori  guess  at  the 

n.REF  n , DATA 

variance  of  the  noise  component  of  Id. 

The  full  set  of  adaptive  filter  equations  is  summarized  as 

follows  in  terms  of  the  variables  unique  to  this  problem: 

Initialization 


2 2 

6-  (1)  = <r  (1)  (5.29) 

n.REF  n , DATA 
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q - MAX{  0,  T > 
i 2 


(5.36) 


a 2 2 
<7  (i-1 ) O'  (i) 

2 n.REF  n,DATA  2 

CT  (i)  + q 

n,REF  2 2 i 

a (i-1)  + O'  (i) 
n.REF  n , RATA 


(5.37) 


In  the  next  section  we  will  analyze  the  stability  of  this  algorithm. 


5.2  Filter  Stability  Analysis 

. In  the  design  of  an  adaptive  Kalman  filter,  because  of  the 
ad  hoc  nature  of  the  covariance  matching  process,  the  question  of 
stability  must  be  addressed.  In  this  section,  we  show  that  the  filter 
is  stable  except  during  periods  when  the  observed  difference  image 

2 2 

variance  indicates  that  the  filter  estimate  of  G"  + (T  is 

n, DATA  n,REF 

too  low.  During  these  periods  the  filter  enters  an  unstable  region  of 

2 2 

A A 

operation,  increasing  CT  + C until  the  sum  is  once  again  in 

n, DATA  n,REF 

agreement  with  the  observed  data. 


Eliminating  q from  (5.26)  and  (5-36)  and  letting 
i 
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2 

6"  (i-1) 

+ n.REF 

K 

12  2 

a (i-1)  + 6"  (i) 

n.REF  n.DATA 


(5.38) 


2 a2 

we  have  recursive  equations  for  <T  and  O'  with  v as  the  only 

n.DATA  n.REF 

forcing  function 


O'  (i)  - a (i-1)  + (1-P  ) v 

n.DATA  n.DATA  i 


(l-(3  ) CT  (i-1) 
n.REF 


(5-39) 


2 +2 

6“  (i)  =*  [ l-K  ] O'  (i-1)  + v 

n.REF  i n.REF  i 


(5-40) 


2 a2 

- O'  (i-1)  - CT  (i) 
n.REF  n.DATA 


The  constraints  placed  on  the  propagation  equations  define  four 
potential  regions  of  operation  for  the  filter: 

Region  I 

T < 0 

1 

T < 0 
2 
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Region  II 


Region  III 


Region  IV 


T > 0 
1 

T < 0 
2 


T > 0 
1 

T > 0 
2 


T < 0 
1 

T > 0 
2 


From  (5.32)  and  (5.34)  we  see  that. 


, / (3  „2 

t >,  o v > a (i-i)  — — — cr  (i_D 

1 i n.REF  l-(3  n.DATA 


(5.41) 
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T > 0 =$■  v > O'  (i-l)  + O'  (i-1) 

2 i n.REF  n , DATA 


(5.42) 


hence  it  is  clear  that  T > 0 is  the  more  restrictive  constraint,  i.c. 
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1 


T > 0 =}  T >0 
2 1 


which  precludes  the  possibility  of  operating  in  Region  IV. 
Incorporating  the  constraints  associated  with  each  region  of  operation 
and  writing  the  resulting  equations  in  matrix  form: 


Region  II 


2 2 
T > 0 6 (i)  = (3  a 

1 n, DAT A n , DATA 

2 

+ (l-(3  ) fv  - & ( i-1 ) ] 

i n,REF 


2 

T <0  =>  q =0 
2 i 


therefore 


2 +2 
& (i)  «=  K 0"  (i) 

n.REF  i n , DATA 


+ 1 + 

*=k  ^ a (i-i)  + k (1-3 )v 

i n.DATA  i 


- K (l-(3)  O'  (i-1) 
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T 


therefore 


+ 

**K  (3 
i 


T 

1 


> 0 =>  O'  (i>  - 

n , DATA 


O (1-1 ) 
n , DATA 


+ (1-fJ  ) [v  - 
i 


2 

& (i-1 )] 

n.REF 


2 2 

>0=}q=v-cr  (i-1)  - 

i 1 n,REF 


2 

A . 

a (i-i) 


n .DATA 


2 + 2 

& (1)  = K 0~  (i)  + v 

n.REF  i n .DATA  i 

.2  .2 

- cr  (i-i)  - cr  (i-i) 

n»REF  n.DATA 


J-  + +2 

G (i-1)  + K (l-P)v  - K (1-8)  O'  (i-1) 
n , DATA  i i i n.REF 


2 

+ v - cr  (i-i)  - or  (i-i) 

i n.REF  n.DATA 
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(5.52) 


(5.53) 


+ ^2  + 2 
**  - (K  (1-0)  + 1]  <7  + [K  (3  -1)  O'  (i-1 ) 

i n.RF.F  i n, DATA 

+ 

+ [K  (l-(3)  + l]v  (5.5-4) 

1 i 


and 


+ 

+ 

' + 

-K  (1-0)  -1 

K 0 -1 

K (1-0)  +1 

i 

i 

i 

s = 

s + 

i 

-0-0  ) 

(3 

i-1 

. ‘-P  - 

In  order  for  the  filter  to  be  stable,  the  homogeneous  solution 

to  the  propagation  equations  must  decay  to  zero.  This  requires  that 

the  eigenvalues  of  the  propagation  matrix  lie  inside  the  unit  circle. 

In  Region  I,  s **  0 is  a degenerate  case.  K will  be  indeterminate 
i i+1 


and  can  be  taken  as  either  one,  based  on  the  prediction  that  the  next 
data  image  will  be  perfect,  or  zero,  based  on  the  observation  that  the 
reference  image  is  already  perfect.  In  practice,  this  case  will  rarely 

2 2 

occur,  and  when  it  does,  both  0"  and  0“  will  be  restored  to 

n.DATA  n,REF 

nonzero  values  as  soon  as  a nonzero  v is  observed;  thus,  the  filter  is 

1 


stable  in  this  region  of  operation. 

For  Regions  II  and  III,  we  solve  the  characteristic  equation 
for  the  eigenvalues  of  the  propagation  matrix  and  investigate  the  range 
of  possible  eigenvalues  in  each  case. 
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(5. GO) 


so  that 


K (l-(3)  < 1 + p 

i 


K ^ 1 and  1 - |3  <1 


(5-61) 


K (l-(3)  < 1 < 1 +0 

i ' 


(5.62) 


and  Region  II  operation  is  stable. 


Region  III 


+ + 

-k(i-(3)-i-A  k (3  - i 

i i 


-d-(3) 


P-1 


2 + 

= A +(1-P)(1  + K)X-1 

i 


(5-63) 


the  solutions  are 


(5.64) 


and 


2 


(5.65) 


Region  III  is  an  unstable  region  of  operation  for  the  filter,  but  the 

only  time  that  the  filter  will  operate  in  this  region  is  when  there  is 

2 

A 

evidence  (from  the  innovation  sequence)  that  the  sum  of  CT  and 

n .DATA 


2 

g-  is  too  small  and  should  be  increased  to  match  the  observed 

n.REF 

sample  variance  of  the  difference  image. 

This  might  occur  if  the  f rame-to-frame  translation  error 

exceeded  the  radius  of  the  search  region.  In  this  case  the  correct 

registration  would  not  be  one  of  the  trial  registrations  and  the 

minimum  distance  would  be  greater  than  the  predicted  value.  This  event 

is  indicative  of  a change  in  the  underlying  image  with  respect  to  the 

reference  image  and  could  be  interpreted  as  a manifestation  of  a loss 

of  track.  Thus,  the  filter  will  operate  in  Region  III  until  T becomes 

2 

negative;  at  which  time,  the  filter  reverts  to  stable  operation  in 
Region  II. 

In  Chapter  6 a simulation  result  will  illustrate  this 
characteristic  rapid  adaptation  of  the  filter  to  a loss  of  lock 
condition  and  the  ability  of  an  integrated  tracking  algorithm  to 
reacquire  the  target  automatically. 

5.3  Kalman  Filter  Performance 

The  performance  of  the  reference-image  update  process  is 
strongly  dependent  on  the  ability  of  the  similarity  detection  algorithm 
to  correctly  register  the  incoming  data  image.  If  the  location  of  the 
best  match  does  not  correspond  to  the  correct  registration,  the  filter 
will  incorporate  the  resulting  error  into  the  reference  image  and 
increase  the  reference-image  noise  variance. 

Figure  38  illustrates  the  ability  of  the  adaptive  Kalman  filter 
to  correctly  estimate  Die  data-image  noise  variance,  even  though  the 
initial  estimate  is  considerably  In  error. 
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INITIAL  VALUE:  o2  - 29 
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WITH  PRE-SMOOTHINC  AND 
FIXED  UNDERLYING  IMAGE 

ADDED  NOISE:  o2  - 25 
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FRAME  NUMBER 


INITIAL  VALUE:  o2  „„„  - 29 
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FRAME  NUMBER 


FRAME  NUMBER 

Figure  38.  Kalman  Filter  Variance  Estimates  and 

Residual  Error  Variance 
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1 

In  this  case,  the  data  image  sequence  was  generated  by 


corrupting  the  image  of  Figure  63  with  an  uncorrelated  Gaussian  noise 

2 2 

sequence  with  0'  ■=  25.  The  initial  value  for  <T  was  29  and 

n , DATA  n,DATA 

{3  was  set  to  have  a time  constant  of  15  frames.  Even  though  the 

incoming  data  has  a noise  component  with  variance  25,  application  of 

the  non-linear  peak  elimination  filter  as  a presmoother  results  in  a 

filter  estimate  of  the  reference-image  noise  variance  of  only  14.  This 

agrees  quite  closely  to  the  approximately  50%  reduction  in  noise 

variance  which  was  realized  when  the  non-linear  peak  elimination  filter 

was  applied  to  a pure  noise  image.  Operating  at  this  noise  level  with 

N = 128,  the  similarity  detection  process  docs  make  errors,  as  shown  by 

Figure  39. 

After  60  frames,  the  tracker  has  built  up  an  error  of  2 pixels 
in  the  horizontal  direction,  and  1 pixel  in  the  vertical  direction. 

Note  that  most  of  the  errors  accumulate  during  the  initialization 
transient  and  before  the  Kalman  filter  has  had  time  to  reduce  the 
reference  image  noise  variance. 

In  Section  4.2  the  nonlinear  peak  elimination  prefilter  was 
developed  to  reduce  the  random  noise  in  regions  of  the  image  with  low 
gradient  magnitudes.  Figure  40  shows  the  difference  image  distribution 
with  and  without  the  prefilter  in  use. 

From  this,  it  seems  clear  that  while  the  prefilter  does  reduce 
the  dif ference-im3ge  variance,  the  characteristic  shape  of  the 
distribution  is  preserved. 
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Figure  AO.  Difference  Imago  Distr ibut lon«.  With  and  Without 
the  Non-Linear  Teak  Elimination  Kilter 
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Summary 


5.4 

In  this  chapter  an  adaptive  Kalman  filter  was  formulated  to 
perform  the  reference  image  update  function  in  a generalized  image 
tracking  system.  The  filter  was  analyzed  to  determined  its  stability 
characteristics  and  its  ability  to  significantly  reduce  the  reference 
image  noise  variance  was  demonstrated  through  simulation.  In  addition, 
the  use  of  the  non-linear  peak  elimination  filter  as  a prefilter  for 
the  sensor  data  was  shown  to  not  affect  the  difference  image  statistics 
other  than  to  reduce  the  sample  variance. 

In  the  next  chapter,  the  various  pieces  of  a tracking  system 
that  have  been  developed  in  chapters  three,  four,  and  five  will  be 
integrated  into  a single  algorithm  and  evaluated  as  a whole  via 
simulation  using  real  sensor  data. 


Chapter  6 


TRACKER  PERFORMANCE 


Up  to  this  point,  the  individual  techniques  for  improving 
tracking  performance  have  been  analyzed  and  demonstrated  in  isolation. 
In  this  chapter,  an  integrated  tracking  algorithm  is  proposed  which 
incorporates  the  concepts  developed  in  previous  chapters,  and  the 
performance  of  this  integrated  tracking  algorithm  is  demonstrated  in 
the  presence  of  both  noise  and  image  change. 

6.1  An  Integrated  Tracking  Algorithm 

An  integrated  tracking  algorithm  was  developed  to  incorporate 
the  nonlinear  peak  elimination  prefilter,  the  adaptive,  reference-set 
selection  process  using  the  gradient-magnitude  estimation  algorithm 
from  Section  4.3.1  and  the  adaptive  Kalman  filter  to  perform  the 
reference-image  update  function.  The  logic  flow  for  this  algorithm  is 
shown  in  Figure  41. 

When  this  algorithm  was  implemented  for  computer  simulation  the 
following  features  were  included: 

1)  The  number  of  pixels  in  the  comparison  set  was 
adjustable  up  to  a value  of  N = 1024  (limited  by 
computer  memory). 

2)  The  data  image  source  was  selectable  between 
either  a sequence  of  noise  corrupted  copies  of  a fixed 
image  or  one  of  the  three  data  image  sequences 
discussed  in  Appendix  I. 

3)  Each  of  the  three  component  algorithms  could  be 
turned  off  to  allow  the  effects  of  its  absence  to  be 
evaluated . 
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When  the  nonlinear  peak  elimination  prefilter  was  turned  off,  no 
prefiltering  was  performed.  When  the  adaptive,  reference-set  selection 
algorithm  was  turned  off,  the  reference-set  was  taken  as  a fixed  grid 
of  pixels  centered  on  the  aimpoint  with  a selectable  inter-pixel 
spacing.  Thus,  a contiguous  block  of  pixels  could  be  used  for  one 
simulation  run,  and  a sparse  grid  could  be  used  for  another.  When  the 
Kalman  filter  was  turned  off,  the  reference  image  update  process  simply 
copied  the  data  image  as  the  source  for  extracting  the  next  reference 
set.  This  procedure  presented  the  maximum  number  of  opportunities  for 
the  tracker  to  accumulate  error  in  a fixed  amount  of  computer 
simulation  time. 

The  performance  of  the  tracker  can  be  separated  into  two  parts 
for  evaluation  purposes.  The  first  part  is  the  performance  of  the 
similarity  detector  (with  or  without  prefiltering)  in  the  presence  of 
noise  in  the  data  image.  In  Section  3.10  it  was  shown  that  the  sum  of 
the  noise  in  the  reference  image  and  the  noise  In  the  data  image  is  the 
factor  which  determines  probability  of  error  and  thus,  for  a particular 
image,  the  mean-square  tracking  error.  The  second  part  is  the 
performance  of  the  adaptive  Kalman  filter  in  estimating  the  underlying 
image  from  the  data  image  sequence.  While  the  filter  can  never  reduce 
the  reference-image  noise  component  to  zero,  it  can  come  very  close  to 

2 2 

reducing  the  sum  of  O'  and  O'  by  a factor  of  two  from  what 

n,REF  n,F)ATA 

it  would  be  without  the  filter  (without  the  filter, 

2 2 

< 7 "O'  ).  Recall  that  for  moderate  values  of  N,  a slgnal- 
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to-noise  ratio  improvement  of  a factor  of  two  can  make  a very 


significant  contribution  to  reducing  the  probability  of  error  (see 
Figure  11). 

In  Section  5-3  the  performance  of  the  adaptive  Kalman  filter 
was  demonstrated  with  respect  to  its  ability  to  reduce  the  reference 
image  noise  variance.  That  performance  is  independent  of  N and  depends 
only  on  the  ability  of  the  similarity  detector  to  provide  a sequence  of 
registered  images.  The  similarity  detector  performance  was  measured  by 
Monte  Carlo  simulation.  Using  the  image  in  Figure  63  as  the  reference 
image,  li  = 32,  64,  128,  and  1024,  and  the  adaptive  reference  selection 
algorithm,  100  noisy  data  images  were  matched  against  the  known  perfect 
reference  set.  The  mean-square  registration  error  was  computed  for 
each  set  of  100  data  images  and  is  shown  in  Figure  42  for  various 
2 

values  of  (T  » Monte  Carlo  runs  of  64,  81,  100,  and  200  images 
n.DATA 

were  made  for  N=64  with  only  small  changes  in  mean-square  error. 

The  nonlinear  peak  elimination  filter  provides  approximately  a 
202  reduction  in  mean-square  error  for  this  particular  image  at  a noise 
variance  of  23.  Figure  43  illustrates  the  decrease  in  average  signal 
strength  that  accompanies  the  increase  in  N for  this  particular 
reference  image  for  a shift  of  +1  pixel  along  the  x-axis.  This 
phenomenon  of  decreasing  mean  squared  error  in  the  face  of  decreasing 
average  signal  strength  serves  to  illustrate  the  fact  that  increasing 
the  number  of  elements  in  the  reference  set  more  than  offsets  the 
decrease  in  average  signal  strength. 

Up  to  this  point,  all  simulations  have  used  a single  known 
underlying  image.  While  this  technique  provides  excellent  control  over 
the  simulation  parameters  and  absolute  knowledge  about  the  relative 
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Figure  43.  Average  Signal  Strength  as  a Function  o 
Reference  Set  Size 


motion  of  the  image  on  a i rame-to-f  rame  basis,  it  does  not  allow 
exploration  of  the  capability  of  the  integrated  tracking  algorithm  to 
track  an  object  which  is  truly  changing  size  and  shape  as  well  as 
position  within  the  image.  It  is  the  potential  ability  of  the  adaptive 
Kalman  filter  to  maintain  an  accurate  estimate  of  the  underlying  image 
in  the  presence  of  change  that  promises  to  improve  the  performance  of 
the  total  tracki  ; system.  The  capability  of  adapting  to  a changing 
scene  and  deriving  a measure  of  system  performance  from  the  noise 
variance  estimates  and  the  measured  signal  strength  will  provide  a 
system  designer  with  features  not  previously  available  in  image- 
tracking systems. 

As  an  example  of  the  performance  that  can  be  obtained  through 
integration  of  nonlinear  prefiltering,  adaptive  reference  selection  and 
the  adaptive  Kalman  Filter,  the  integrated  tracker  was  used  to  track 
the  data  image  sequence  AIRPLANE  for  89  frames.  The  nose  of  the 
aircraft  was  designated  in  the  first  frame,  and  the  detected  frame-to- 
frame  motion  was  accumulated  to  produce  the  estimated  position  of  the 
aircraft  within  each  sequential  image.  After  each  reference  image 
update,  the  reference  image  was  written  onto  magnetic  tape  with  the 
location  of  the  estimated  target  position  marked  with  a cross  hair.  A 
sample  of  these  reference  images  with  the  indicated  aimpoints  marked  is 
shown  in  Figure  44  through  Figure  48. 

These  images  correspond  to  the  reference  image  after  being 
updated  from  the  corresponding  image  in  Figure  58  through  Figure  62. 
Figure  49  shows  the  detected  motion  of  the  image  sequence  with  every 
tenth  frame  numbered  (some  frames  had  no  detected  motion  so  there  are 
not  necessarily  nine  points  between  each  marked  frame). 
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Figure  44.  Updated  referenre  image  after  frames  1 and  11 


168 


Frame  41 


Frame  51 


Figure  46 


Updated  reference  image  after  frames  41  and  51 
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The  filter  estimates  of  O'  and  (X  are  shown  in 

n,DATA  n,REF 

Figure  50  and  the  variance  of  the  difference  image  is  plotted  in  Figure 

51. 

For  this  simulation,  there  were  128  elements  in  the  reference 
set.  For  comparison,  Figure  52  shows  the  detected  motion  for  a minimum 
norm  tracker  using  a 256-pixel  reference  set  arranged  in  a contiguous 
block  (16  by  16)  centered  at  the  initial  position  indicated  by  the 
cross  hair  in  Figure  44  but  with  no  prefiltering  and  no  Kalman  filtered 
reference  update. 

While  this  tracker  takes  maximum  advantage  of  the  correlation 
between  correct  and  incorrect  trial  registrations,  and  uses  twice  as 
many  pixels  in  the  reference  set,  it  cannot  track  the  taotion  of  the 
image  sequence. 

Figure  53,  Figure  54  and  Figure  55  illustrate  the  performance 
of  the  integrated  tracking  algorithm  in  the  presence  of  added  noise. 

Two  important  tracker  characteristics  are  demonstrated  in  this 
tracking  sequence.  First,  the  filter  was  initialized  with  a value  of 
2 

O'  (0)  = 29  which  is  an  unnecessarily  pessimistic  value.  The 

n.DATA 

filter  however  rapidly  diagnosed  that  this  value  was  not  consistent 
with  the  observed  difference  image  sample  variances  and  reduced  the 
2 

estimate  of  O'  to  about  20  over  a period  of  30  frames  (2  time 

n .DATA 

constants  for  the  selected  value  of  j?  j . This  value  then  remained 
approximately  constant  for  the  rest  of  this  run  with  the  exception  of  a 
perturbation  around  frame  65  due  to  a loss  of  track  and  the  ensuing 
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Figure  50.  Kalman  Filter  Variance  Estimates  for  Integrated  Tracking 
Algorithm  (Image  Sequence  AIRPLANE) 


Difference  Image  Sample  Variance  for  Integrated  Tracking 
Algorithm  (Inage  Sequence  AIRPLANE) 


FRAME  NUM15KR 


Figure  52.  Detected  linage  Motion  for  Image  Sequence 
AIRPLANE  Using  a 16x16  Block  Reference 
and  No  Preiilter  or  Reference  Update  Filter 


Figure  53.  Sensor  Noise  Variance  Estimate  for  Integrated  Tracking  Algorithm 

when  Tracking  Image  Sequence  AIRPLANE  in  the  Presence  of  Added  Noise 


reacquisition.  Second,  the  loss  o£  track  which  occurs  at  frame  63  is 
immediately  recognized  by  the  filter  as  a significant  event.  The  very 

2 

rapid  increase  in  O'  which  allows  the  reference  image  to  change 

n.RF.F 

quickly,  is  a direct  result  of  the  f liter  entering  Region  III  of 
operation  (the  unstable  region).  By  frame  67  the  filter  has  reentered 

2 

Region  II  and  has  reduced  O'  to  near  its  previous  value  by  frame 

n.REF 


71 . 


2 

A 

The  gradual  increase  in  CT  from  frame  72  to  the  end  of  the 

n,REF 

data  sequence  is  attributable  to  the  variation  in  size  and  shape  of  the 
aircraft  image,  and  demonstrates  the  ability  of  the  filter  to 
accommodate  itself  to  actual  image  change. 

The  conclusions  to  be  drawn  from  these  simulations  and  those  of 
Section  5-3  are  that: 

1)  Adaptive  reference  selection  maximizes  the 
signal  component  of  the  reference  set. 

2)  Performance  of  the  minimum  norm  tracking 
algorithm  as  measured  by  mean  square  registration  error 
improves  with  increasing  N and  also  improves  with 
increasing  signal-to-noise  ratio  for  fixed  N. 

3)  The  nonlinear  peak  elimination  prefilter 
reduces  moan  square  tracking  error  through  reduction  of 
the  data  image  noise  variance. 

4)  The  adaptive  Kalman  filter  can  significantly 
reduce  the  reference  image  noise  variance  and 
simultaneously  estimate  both  the  reference  image  noise 
variance  and  the  data  image  noise  variance. 
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6 • 2 Summary 


In  this  chapter  the  performance,  characteristics  of  an 
integrated  tracking  algorithm  have  been  investigated.  The  selected 
algorithm  incorporated  the  non-linear  peak  elimination  prefilter, 
adaptive  reference  set  selection  using  the  gradient  magnitude  histogram 
for  selecting  reference  pixels,  and  the  adaptive  Kalman  filter  for 
reference  image  update.  The  benefits  of  increasing  reference  3et  aixe 
were  illustrated  and  the  capability  of  the  integrated  tracking 
algorithm  to  accurately  track  an  image  sequence  was  demonstrated. 


Chapter  7 

CONCLUSIONS  AND  RECOMMENDATIONS 


The  major  thrust  of  this  research  was  directed  toward 
developing  new  techniques  for  tracking  sequences  of  digitized  images. 

A model  of  a generalized  image  tracking  system  was  defined  for  use  as  a 
basis  for  analysis,  and  four  new  techniques  were  developed.  The 
practical  implications  of  these  techniques  are  summarized  in  the  next 
section,  as  are  the  conclusions  which  can  be  drawn  from  this  work.  In 
the  last  section  several  recommended  areas  for  future  research  are 
pointed  out. 


7.1  Summary  and  Conclusions 


Four  new  techniques  were  developed  for  application  to  the 
general  sequential  image  tracking  Problem: 

1)  A non-linear  peak  elimination  prefilter 

2)  Two  techniques  for  similarity  detection: 

a)  A non-uniformly  weighted  norm 

b)  An  adaptive  reference  set 
selection  algorithm  based  on  the 
gradient  magnitude  histogram  (Including 
a new  and  very  effective  gradient 
magnitude  estimator) 

3)  An  adaptive  Kalman  filter  to  perform  the 
reference  image  update 

While  the  four  techniques  which  were  developed  are  applicable 
to  three  different  functional  areas  in  the  general  image  tracking 
system,  and  with  the  exception  of  the  non-uniformly  weighted  norm  and 


Che  adaptive  reference  set  selection  algorithm  which  are  not  directly 
comparable,  it  is  possible  to  provide  a subjective  evaluation  of  their 
relative  merit. 


If 


Hi 


The  greatest  payoff  is  obtained  by  using  the  adaptive  Kalman 
filter  to  maintain  a high  accuracy,  low  noise  reference  image  at  all 
times.  The  effective  signal-to-noise  ratio  for  the  tracker  is  almost 
doubled  when  the  filter  is  used,  with  a corresponding  Improvement  in 
tracker  performance. 

The  next  most  usefull  of  the  techniques  developed  is  the 
adaptive  reference  set  selection  algorithm.  The  performance 
improvement  which  is  obtained  by  tracking  on  subsets  of  the  reference 
and  data  Images  comes  from  the  correspondingly  larger  image  that  can  be 
processed.  For  example,  a tracker  which  today  can  process  256 
reference  pixels  and  256  trial  registrations  per  frame  may  only 
maintain  a data  image  containing  1024  total  pixels  and  a reference 
image  of  256  pixels.  By  using  the  adaptive  reference  set  selection 
algorithm,  a much  larger  reference  image  can  be  maintained  (perhaps  as 
large  as  the  entire  data  image)  while  only  processing  a small  subset  to 
determine  image  misregistration.  The  resulting  signal -to-noise  ratio 
is  substantially  enhanced  by  using  only  the  "good"  pixels  for  the 
reference  set;  the  same  processor  speed  can  be  tolerated.  The  one 
factor  on  which  this  projection  depends  is  the  avallabllicy  of  a device 
to  perform  the  gradient  estimation  task  at  realtime  rates.  While  Che 
gradient  estimator  developed  in  Chapter  4 has  many  attractive  features, 
a less  complex  gradient  estimator  in  hardware  might  prove  to  be 
satisfactory  in  implementing  the  adaptive  reference  set  selection 
algorithm  for  a particular  application. 
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The  nonlinear  peak  elimination  prefilter  appears  to  be  very 
easy  to  implement  in  either  hardware  or  software,  and  for  low  contrast 
images  seems  to  provide  up  to  a factor  of  two  reduction  in  noise 
variance.  However,  for  high  contrast  imagery  the  adaptive  reference 
set  selection  algorithm  will  incorporate  into  the  reference  set  pixels 
tfilch  lie  on  high  gradient  edges,  and  are  less  likely  to  have  been 
affected  by  the  prefilter.  Under  this  condition,  the  nonlinear  peak 
elimination  filter  may  not  provide  its  maximum  potential  benefit. 

An  important  aspect  of  the  integrated  tracking  algorithm  is  the 
serendipitous  behavior  of  the  component  parts.  The  non-linear  peak 
elimination  prefilter  reduces  the  random  noise  component  of  the 
incoming  data  Images.  The  adaptive  reference  set  selection  algorithm 
maximizes  the  signal  component  of  the  reference  set  so  that  the  minimum 
distance  registration  is  correct  a higher  percentage  of  the  time,  thus 
reducing  the  average  difference  image  sample  variance.  The  adaptive 
Kalman  filter  maintains  a high  quality  (low  noise)  reference  image  and 
estimates  both  the  data  image  noise  variance  and  the  reference  image 
noise  variance.  The  gradient  magnitude  estimator  uses  the  Kalman 
filter  estimate  of  the  reference  image  noise  variance  to  control  the 
detection  threshold  and ‘thus  maintains  a fixed  probability  of 
erroneously  including  a bad  pixel  in  the  reference  set.  Since  the 
pixels  in  the  reference  set  tend  to  lie  along  edges  in  the  image,  the 
natural  adjacency  of  the  reference  set  pixels  takes  advantage  of  the 
noise  correlation  that  exists  between  the  correct  and  Incorrect  trial 
registrations  and  reduces  the  probability  of  selecting  an  Incorrect 
trial  registration.  The  reduced  noise  component  of  the  Kalman  filtered 
reference  image  further  decreases  the  probability  of  error. 
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7.2  Recommended  Future  Work 


Several  areas  of  potentially  frui.tful  research  are  apparent. 
Since  the  smoothness  of  a two  dimensional  distance  function  determines 
the  appropriateness  of  the  search  type,  an  exhaustive  search  is 
dictated  where  the  trend  information  from  adjacent  trial  registrations 
is  an  unreliable  indicator  of  the  direction  toward  the  minimum  of  the 
distance  function.  It  seems  reasonable  to  expect  that  a roughness 
parameter  can  be  developed  which  is  a function  of  the  s ignal-to-noise 
ratio.  This  parameter  should  indicate  the  probability  that  the  minimum 
distance  lies  in  the  direction  indicated  by  a distance  function 
gradient  measure  associated  with  a particular  trial  registration. 

The  development  of  techniques  to  determine  the  location  of  the 
registration  coordinates  by  interpolating  between  trial  registrations 
would  be  a useful  extension  of  this  research.  This  seems  to  have  some 
potential  for  reducing  error. 

The  question  of  when  to  extract  a new  reference  set  from  the 
reference  image  in  order  to  minimize  the  aimpoint  drift  rate  remains 
unanswered,  as  well  as  a number  of  questions  regarding  the  relative 
performance  of  trackers  employing  more  easily  computed  distance 
functions  (absolute  value  or  Hamming  distances  for  example). 

In  the  near  future  there  is  probably  a speed  advantage  to  be 
had  in  any  digital  processor  using  fixed  point  arithmetic.  As  a result 
there  are  questions  to  be  answered  regarding  the  appropriate  word 
length  and  scaling  to  be  used  in  mechanizing  the  adaptive  Kalman 
filter. 
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Appendix  I 

Data  Characteristics 

Three  image  sequences  were  gathered  for  use  In  tracking 
experiments.  Sample  images  from  each  sequence  are  illustrated  in 
Figure  56  through  Figure  62. 

All  image  sequences  were  obtained  from  vidicon  sensors, 
recorded  on  commercial  video  recorders,  and  transferred  to  a video  disc 
for  digitizing.  A subsection  128  pixels  wide  by  96  pixels  high  from 
alternate  T.V.  fields  was  then  digitized  to  six-bit  accuracy  with  an 
inter-pixel  separation  of  100  nanoseconds  in  the  scan  direction.  Total 
system  video  bandwidth  was  approximately  3.5  MKz.  No  correction  was 
made  for  nonlinearity  of  the  vidicon  input-output  transfer  function. 

For  purposes  of  reproduction  the  dynamic  range  of  the  images  was 
expanded  linearly  to  the  point  where  one  tenth  of  one  percent  of  the 
brightest  and  darkest  pixels  were  clipped.  To  obtain  the  original 
aspect  for  the  images,  tilt  the  page  until  the  ratio  of  heigh:  to  width 
is  1.17.  Figure  64,  Figure  65,  and  Figure  66  present  the  distribution 
of  intensity  levels  for  the  first  frame  of  each  sequence. 

Figure  67  shows  the  relationship  between  the  original  T.V. 
format  and  the  images  as  reproduced  here. 

Both  CARS  and  TREES  were  obtained  under  controlled  conditions 
with  a rigidly  mounted,  high  quality,  commercial  T.V.  camera.  The 
sequence  CARS  presents  a highway  intersection  with  cars  stopped  at  a 
red  light.  The  image  sequence  TREES  presents  a field  of  ripe  winter 
wheat  containing  some  weed  growth  with  a line  of  trees  in  the 
background . 
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AIRPLANE  was  obtained  under  less  controlled  conditions  from  a 


small  ruggedized  T. V.  camera  mounted  on  the  glmbal  of  a ground-based 
aircraft  tracking  system.  This  image  sequence  presents  an  Air  Force  F-4 
aircraft  making  a low  pass  over  the  tracking  site.  The  aircraft  is 
moving  with  respect  to  the  background,  the  sensor  field  of  view  is 
roving  with  respect  to  the  line-of -sight  to  the  aircraft,  and  the 
aircraft  is  changing  in  both  aspect  and  apparent  size  during  the 
approximately  three  seconds  represented  by  this  image  sequence. 
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Figure  57 


TREES 


Frame  41 


Frame  51 


Figure  60.  AIRPLANE  - frames  41  and  51 
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TREKS  FRAME  1 


Figure  64.  Intensity  Histograms  for  Frame  1 
of  CARS  and  TREES 
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AIRPLANE  FRAME  1 
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Intensity  Histograms  for  Frames  1 
and  89  of  AIRPLANE 
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Relationship  Between  Standard 
Scan  T.V.  Field  and  Digitized  Images 


